cscherrer / Soss.jl

Probabilistic programming via source rewriting
https://cscherrer.github.io/Soss.jl/stable/
MIT License
414 stars 30 forks source link

CSV interference #317

Closed jariji closed 2 years ago

jariji commented 2 years ago

https://gist.githubusercontent.com/jariji/5b1b8d77fed086dd6b90abc41cbfe190/raw/2b8929c4d486e39a2fe15439aab265e820804c95/outdata.csv

using Soss, CSV
CSV.File(path, comment="#")

It hangs with Soss loaded but works without it.

[336ed68f] CSV v0.9.11
[8ce77f84] Soss v0.20.9
julia 1.7.0
Linux 5.14.8 x86_64
cscherrer commented 2 years ago

Thanks @jariji . I'm not sure what I can do with this, because

  1. I'm not aware of any type piracy in Soss
  2. CSV.jl is very complex, and is the one doing the work
  3. The problem seems to happen at compile time, for example try comparing the results of @code_lowered between these
  4. It could be that Soss itself is not the culprit, but one of its dependencies.

To try to narrow this down a little, here's a list of common Manifest packages between Soss and CSV:

  [0dad84c5] ArgTools
  [10745b16] Statistics
  [14a3606d] MozillaCACerts_jll
  [1a1011a3] SharedArrays
  [29816b5a] LibSSH2_jll
  [2a0f44e3] Base64
  [2f01184e] SparseArrays
  [34da2185] Compat v3.41.0
  [3783bdb8] TableTraits v1.0.1
  [37e2e46d] LinearAlgebra
  [3f19e933] p7zip_jll
  [3fa0cd96] REPL
  [44cfe95a] Pkg
  [4536629a] OpenBLAS_jll
  [4ec0a83e] Unicode
  [56ddb016] Logging
  [56f22d72] Artifacts
  [6462fe0b] Sockets
  [69de0a69] Parsers v2.1.3
  [76f85450] LibGit2
  [82899510] IteratorInterfaceExtensions v1.0.0
  [83775a58] Zlib_jll
  [842dd82b] InlineStrings v1.1.1
  [8ba89e20] Distributed
  [8bb1440f] DelimitedFiles
  [8dfed614] Test
  [8e850b90] libblastrampoline_jll
  [8e850ede] nghttp2_jll
  [8f399da3] Libdl
  [9a3f8284] Random
  [9a962f9c] DataAPI v1.9.0
  [9e88b42a] Serialization
  [9fa8497b] Future
  [a4e569a6] Tar
  [a63ad114] Mmap
  [ade2ca70] Dates
  [b27032c2] LibCURL
  [b77e0a4c] InteractiveUtils
  [bd369af6] Tables v1.6.1
  [c8ffd9c3] MbedTLS_jll
  [ca575930] NetworkOptions
  [cc47b68c] SimplePolynomials v0.2.8
  [cf7118a7] UUIDs
  [d6f4376e] Markdown
  [de0858da] Printf
  [deac9b47] LibCURL_jll
  [e2d170a0] DataValueInterfaces v1.0.0
  [e66e0078] CompilerSupportLibraries_jll
  [ea8e919c] SHA
  [f43a241f] Downloads
  [fa267f1f] TOML

JET also shows some possible errors in the call to CSV, without Soss loaded (again, even static analysis hangs with both loaded)

julia> @report_call CSV.File("/home/chad/outdata.csv", comment="#")
═════ 9 possible errors found ═════
┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:220 CSV.#File#25(header, normalizenames, datarow, skipto, footerskip, transpose, comment, ignoreemptyrows, ignoreemptylines, select, drop, limit, buffer_in_memory, threaded, ntasks, tasks, rows_to_check, lines_to_check, missingstrings, missingstring, delim, ignorerepeated, quoted, quotechar, openquotechar, closequotechar, escapechar, dateformat, dateformats, decimal, truestrings, falsestrings, type, types, typemap, pool, downcast, lazystrings, stringtype, strict, silencewarnings, maxwarnings, debug, parsingdebug, validate, _3, source)
│┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:220 ctx = CSV.Context(CSV.Arg(source), CSV.Arg(header), CSV.Arg(normalizenames), CSV.Arg(datarow), CSV.Arg(skipto), CSV.Arg(footerskip), CSV.Arg(transpose), CSV.Arg(comment), CSV.Arg(ignoreemptyrows), CSV.Arg(ignoreemptylines), CSV.Arg(select), CSV.Arg(drop), CSV.Arg(limit), CSV.Arg(buffer_in_memory), CSV.Arg(threaded), CSV.Arg(ntasks), CSV.Arg(tasks), CSV.Arg(rows_to_check), CSV.Arg(lines_to_check), CSV.Arg(missingstrings), CSV.Arg(missingstring), CSV.Arg(delim), CSV.Arg(ignorerepeated), CSV.Arg(quoted), CSV.Arg(quotechar), CSV.Arg(openquotechar), CSV.Arg(closequotechar), CSV.Arg(escapechar), CSV.Arg(dateformat), CSV.Arg(dateformats), CSV.Arg(decimal), CSV.Arg(truestrings), CSV.Arg(falsestrings), CSV.Arg(type), CSV.Arg(types), CSV.Arg(typemap), CSV.Arg(pool), CSV.Arg(downcast), CSV.Arg(lazystrings), CSV.Arg(stringtype), CSV.Arg(strict), CSV.Arg(silencewarnings), CSV.Arg(maxwarnings), CSV.Arg(debug), CSV.Arg(parsingdebug), CSV.Arg(validate), CSV.Arg(false))
││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/context.jl:285 CSV.isempty(missingstrings)
│││┌ @ essentials.jl:775 Base.iterate(itr)
││││ no matching method found for call signature (Tuple{typeof(iterate), Nothing}): Base.iterate(itr::Nothing)
│││└─────────────────────
│┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:221 CSV.File(ctx)
││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:225 #self#(ctx, false)
│││┌ @ threadingconstructs.jl:182 task = Base.Threads.Task(#27)
││││┌ @ threadingconstructs.jl:178 CSV.multithreadparse(Core.getfield(#self#, :ctx), Core.getfield(#self#, :pertaskcolumns), Core.getfield(#self#, :rowchunkguess), Core.getfield(#self#, :i), Core.getfield(#self#, :rows), Core.getfield(#self#, :wholecolumnslock))
│││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:351 CSV.parsefilechunk!(ctx, task_pos, task_len, rowchunkguess, rowchunkoffset, task_columns, Base.getproperty(ctx, :customtypes))
││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:607 CSV.parserow(startpos, row, numwarnings, ctx, buf, pos, len, rowsguess, rowoffset, columns, _)
│││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:985 CSV.parserow(startpos, row, numwarnings, ctx, buf, pos, len, rowsguess, rowoffset, cols, _)
││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:654 CSV.detectcell(buf, pos, len, row, rowoffset, i, col, ctx, rowsguess)
│││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:759 CSV.detect(CSV.pass, buf, pos, len, opts, false, Base.getproperty(ctx, :downcast), CSV.+(rowoffset, row), i)
││││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/utils.jl:480 date = Parsers.xparse(CSV.Date, buf, pos, len, opts, CSV.Date)
│││││││││││┌ @ /home/chad/.julia/packages/Parsers/a3jNK/src/Parsers.jl:310 Parsers.typeparser(_, source, pos, len, b, code, options)
││││││││││││┌ @ /home/chad/.julia/packages/Parsers/a3jNK/src/dates.jl:384 Parsers.tryparsenext(tok, source, pos, len, b, code)
│││││││││││││┌ @ /home/chad/.julia/packages/Parsers/a3jNK/src/dates.jl:318 res = Dates.tryparsenext(tok, str, 1, strlen)
││││││││││││││ no matching method found for call signature (Tuple{typeof(Dates.tryparsenext), Dates.DatePart{'z'}, String, Int64, Int64}): res = Dates.tryparsenext::typeof(Dates.tryparsenext)(tok::Dates.DatePart{'z'}, str::String, 1, strlen::Int64)
│││││││││││││└─────────────────────────────────────────────────────────────
││││││││││││┌ @ /home/chad/.julia/packages/Parsers/a3jNK/src/dates.jl:386 Parsers.tryparsenext(tok, source, pos, len, b, code)
│││││││││││││┌ @ /home/chad/.julia/packages/Parsers/a3jNK/src/dates.jl:318 res = Dates.tryparsenext(tok, str, 1, strlen)
││││││││││││││ no matching method found for call signature (Tuple{typeof(Dates.tryparsenext), Dates.DatePart{'Z'}, String, Int64, Int64}): res = Dates.tryparsenext::typeof(Dates.tryparsenext)(tok::Dates.DatePart{'Z'}, str::String, 1, strlen::Int64)
│││││││││││││└─────────────────────────────────────────────────────────────
││││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/utils.jl:497 DT = CSV.timetype(Base.getproperty(opts, :dateformat))
│││││││││││ for 1 of 2 union split cases, no matching method found for call signatures (Tuple{typeof(CSV.timetype), Nothing})): DT = CSV.timetype(Base.getproperty(opts::Parsers.Options, :dateformat::Symbol)::Union{Nothing, Parsers.Format})
││││││││││└─────────────────────────────────────────────────────────
││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:684 CSV.parsevalue!(CSV.InlineString127, buf, pos, len, row, rowoffset, i, col, ctx)
│││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:889 newT = CSV.widen(newT)
││││││││││┌ @ operators.jl:951 Base.throw(Base.MethodError(Base.widen, Core.tuple(_)))
│││││││││││ MethodError: no method matching widen(::Type{String})
││││││││││└────────────────────
││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:690 CSV.parsevalue!(CSV.PosLenString, buf, pos, len, row, rowoffset, i, col, ctx)
│││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:828 ref = CSV.getref!(Base.getproperty(col, :refpool), _, Parsers.getstring(buf, Base.getproperty(res, :val), Base.getproperty(opts, :e)))
││││││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:923 CSV.get!(#64, x, key)
│││││││││││┌ @ dict.jl:451 key = Base.convert(_, key0)
││││││││││││┌ @ missing.jl:69 Base.convert(Base.nonmissingtype_checked(_), x)
│││││││││││││┌ @ strings/basic.jl:232 _(s)
││││││││││││││ no matching method found for call signature (Tuple{Type{PosLenString}, String}): _::Type{PosLenString}(s::String)
│││││││││││││└────────────────────────
│││┌ @ threadingconstructs.jl:182 task = Base.Threads.Task(#28)
││││┌ @ threadingconstructs.jl:178 CSV.multithreadpostparse(Core.getfield(#self#, :ctx), Core.getfield(#self#, :ntasks), Core.getfield(#self#, :pertaskcolumns), Core.getfield(#self#, :rows), Core.getfield(#self#, Symbol("#427#finalrows")), Core.getfield(#self#, :j), Core.getfield(#self#, :col))
│││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:447 CSV.makepooled!(col)
││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:546 CSV.makepooled2!(col, T, r, column)
│││││││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:563 CSV.PooledArray(Base.getproperty(CSV.PooledArrays, :RefArray)(column), r)
││││││││┌ @ /home/chad/.julia/packages/PooledArrays/DuIZ1/src/PooledArrays.jl:88 #self#(refs, invpool, PooledArrays._invert(invpool))
│││││││││┌ @ /home/chad/.julia/packages/PooledArrays/DuIZ1/src/PooledArrays.jl:88 #self#(refs, invpool, pool, Core.apply_type(Base.Threads.Atomic, PooledArrays.Int)(1))
││││││││││┌ @ /home/chad/.julia/packages/PooledArrays/DuIZ1/src/PooledArrays.jl:88 Core.apply_type(PooledArrays.PooledArray, _, _, PooledArrays.ndims(_), _)(refs, invpool, pool, refcount)
│││││││││││┌ @ /home/chad/.julia/packages/PooledArrays/DuIZ1/src/PooledArrays.jl:56 PooledArrays.extrema(Base.getproperty(rs, :a))
││││││││││││┌ @ multidimensional.jl:1694 Base.#extrema#535(Base.:, #self#, A)
│││││││││││││┌ @ multidimensional.jl:1694 Base._extrema_dims(Base.identity, A, dims)
││││││││││││││┌ @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/SparseArrays/src/higherorderfns.jl:1220 SparseArrays.HigherOrderFns._extrema_itr(f, A)
│││││││││││││││┌ @ operators.jl:562 y = Base.iterate(itr, s)
││││││││││││││││┌ @ abstractarray.jl:1142 y = Base.iterate(state...)
│││││││││││││││││┌ @ multidimensional.jl:388 Base.getproperty(state, :I)
││││││││││││││││││┌ @ Base.jl:42 Base.getfield(x, f)
│││││││││││││││││││ type Int64 has no field I
││││││││││││││││││└──────────────
│││││││││││││││││┌ @ range.jl:838 Base.+(i, Base.step(r))
││││││││││││││││││ no matching method found for call signature (Tuple{typeof(+), CartesianIndex{2}, Int64}): Base.+(i::CartesianIndex{2}, Base.step(r::Base.OneTo{Int64})::Int64)
│││││││││││││││││└────────────────
│││┌ @ /home/chad/.julia/packages/CSV/9LsxT/src/file.jl:335  = Core.kwfunc(CSV.rm)(Core.apply_type(Core.NamedTuple, (:force,))(Core.tuple(true)), CSV.rm, Base.getproperty(ctx, :tempfile))
││││ for 1 of 2 union split cases, no matching method found for call signatures (Tuple{Base.Filesystem.var"#rm##kw", NamedTuple{(:force,), Tuple{Bool}}, typeof(rm), Nothing})):  = Core.kwfunc(CSV.rm)::Base.Filesystem.var"#rm##kw"(Core.apply_type(Core.NamedTuple, (:force,)::Tuple{Symbol})::Type{NamedTuple{(:force,)}}(Core.tuple(true)::Tuple{Bool})::NamedTuple{(:force,), Tuple{Bool}}, CSV.rm, Base.getproperty(ctx::CSV.Context, :tempfile::Symbol)::Union{Nothing, String})
│││└────────────────────────────────────────────────────────

Finally, because it's a compile-time problem, my guess is it might be a problem in a generated function. One possibility is that a dependent package of CSV might use Requires.jl, and check whether some dependent of Soss is loaded.

I think we should transfer this to the CSV repo, since they know their code better and might have an easier time tracking down the problem.