JuliaLang / TOML.jl

A fast TOML parser for TOML 1.0 written in Julia
MIT License
34 stars 15 forks source link

More precise type-specification? #7

Closed timholy closed 3 years ago

timholy commented 4 years ago

I'm sure you can guess where this question comes from...is it worth specifying the precise set of types that can be returned by tomldict[key]? I'm not saying change the value type of the dict itself (unless the number of types is quite small), but just whether this should be documented or available as

const TOMLValue = Union{String,Dict{String,Any},...}

Here's a demo of what I'm thinking:

julia> const TOMLValue = Union{String,Dict{String,Any},Int,Symbol,Float64}
Union{Float64, Int64, Dict{String,Any}, String, Symbol}

julia> function f(d)
           TOMLDict = typeintersect(Dict, TOMLValue)
           TOMLString = typeintersect(AbstractString, TOMLValue)
           v = dct["name"]
           if v isa TOMLDict
               return g(v)
           elseif v isa TOMLString
               return g(v)
           else
               error("type not supported")
           end
       end

f (generic function with 1 method)

julia> g(x::AbstractString) = 1
g (generic function with 1 method)

julia> g(x::AbstractDict) = "hello"
g (generic function with 2 methods)

julia> code_typed(f, (Dict{String,Any},))
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Main.dct::Any
│   %2 = Base.getindex(%1, "name")::Any
│   %3 = (%2 isa Dict{String,Any})::Bool
└──      goto #3 if not %3
2 ─      return "hello"
3 ─ %6 = (%2 isa String)::Bool
└──      goto #5 if not %6
4 ─      return 1
5 ─      Main.error("type not supported")::Union{}
└──      unreachable
) => Union{Int64, String}

Of course we could also hard-code TOMLDict, TOMLString, etc somewhere.

I ask mostly because there are still invalidation/inference problems tracing back to TOML._print.

KristofferC commented 4 years ago

Right now it does stuff like

julia> TOML.parsestring("foo = [1, 2.0]")
Dict{String,Any} with 1 entry:
  "foo" => Union{Float64, Int64}[1, 2.0]
            ^^^^^^^^^^^^^^^^^^^^

so it's a bit hard to exhaustively list all concrete types.

timholy commented 4 years ago

Gotcha. Could one normalize the outputs via promotion/typejoin whatever? Or does the type of each value encode something important? (Disclaimer: I know nothing about the TOML format, so these questions are very naive.)

KristofferC commented 4 years ago

I mean, of course we could just use Vector{Any} but I noticed some performance regressions in Pkg over having something like Union{Vector{String}, String}. So now it does https://github.com/KristofferC/TOML.jl/blob/0ff04bf4248afb92e0822d037826ffe0a78eb99b/src/parser.jl#L654-L669.

But I didn't really get what the problem is? The discussion here seems to be regarding the return value of the parser but you also said that the problem was TOML._print. TOML.print is not performance sensitive so we can add inference barriers there if it helps.

timholy commented 4 years ago

There's not really a serious problem. It's just as I'm thrashing my way through the Pkg code and there's a if x isa AbstractDict then I spend a lot of time trying to figure out what key/value types it might have. If the answer is "you can't predict" then so be it.

Feel free to close.

timholy commented 4 years ago

It's not just TOML._print, that's just an example. But of course we may be winding down on the remaining Pkg inference problems.

KristofferC commented 4 years ago

then I spend a lot of time trying to figure out what key/value types it might have.

And only concrete types are good? Otherwise it would be

Union{Int64, Float64, String, Dict{String, Any}, Vector, Dates.DateTime, Dates.Date, Dates.Time}
timholy commented 4 years ago

That might be helpful, at least just for my own information. (It is really Int64 or just Int?)

No need to make any changes yet unless you want to. This is very tentative, but one example of a way this information might be useful is that there are two internal methodinstances in Pkg, is_opt and normalize, that get inferred with AbstractString argument types. If it's really String, this might be one source (perhaps among many) of inferring Base's string-processing internals with AbstractString rather than a concrete subtype (xref the last section of this post). Really we might want to scan for any AbstractString-inferred argument type throughout all MethodInstances in Pkg, because anything that called out to Base directly with an AbstractString would not show up in my analysis of Pkg internal methods. (cf MethodAnalysis/demos/abstract.jl and abstract_test.jl)

KristofferC commented 4 years ago

Yes, Int64 (at least with the new parser). Withing Pkg it should be pretty much only String and some SubString{String}

KristofferC commented 3 years ago

I think this can be closed