andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
278 stars 28 forks source link

remove dependence on Julia internal `Core.Compiler.return_type` #127

Closed palday closed 6 months ago

palday commented 6 months ago

see also https://github.com/JuliaLang/julia/issues/52385

There's also another problem that arises on Julia 1.10 when Indices are constructed from an empty tuple, the resultant Indices{Union{}} create all sorts of problems. To get around this, I've added a check for empty tuples and changes the indices eltype to be Any. See also https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/472

andyferris commented 6 months ago

OK, thanks.

Just trying to get my head around everything. Is using code_typed ultimately less likely to invoke undefined behavior than return_type? Is there any difference in obsrvable behavior? Does this make something work that doesn't work without it? (The Julia 1.10 issues seem more related to changes to handling of Union{} than to changes in the type inference algorithm).

palday commented 6 months ago

Just trying to get my head around everything. Is using code_typed ultimately less likely to invoke undefined behavior than return_type?

Probably not, but return_type is considered an internal function and there seems to be reluctance to treat the problems with Union{} as real as long as you're accessing an internal function. So there is a real problem that this package demonstrates (well, via errors that occur in dependents' use of this package) but that problem is considered invalid by some core Julia devs because you're using internals.

That said, using internals is also risky because their behavior can change or be dropped at any point.

Is there any difference in obsrvable behavior? Does this make something work that doesn't work without it?

No(ish) and yes. There are two changes here:

The latter change in particular fixes a major downstream Julia 1.10 compatibility issue: https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/472

(The Julia 1.10 issues seem more related to changes to handling of Union{} than to changes in the type inference algorithm).

I agree, but the IMHO legitimate problem in the new handling of Union{} is being dismissed because you're touching compiler internals.

nsajko commented 6 months ago

The latter change in particular fixes a major downstream Julia 1.10 compatibility issue

Shouldn't this rather be fixed in AoG? Assuming it doesn't get fixed in Julia.

palday commented 6 months ago

Shouldn't this rather be fixed in AoG? Assuming it doesn't get fixed in Julia.

AoG is calling things, but the actual error occurs in Dictionaries.jl code.

aplavin commented 6 months ago

Creating Indices{Union{}} is a perfectly fine thing to do, and it's not prevented by this PR (neither should it).

julia> Indices(Union{}[])
0-element Indices{Union{}}

This PR adds an explicit check to one special case, where Indices are constructed from a Tuple. Is that worth it? I'm hopeful for the Julia team to acknowledge and likely fix the breaking change in some 1.10 version...

The change of return_type -> code_typed is orthogonal to that Tuple{} thing, and doesn't really reduce any dependency on "internals". Inference results are not guaranteed to stay stable, and this influences both return_type and code_typed. Note that (according to JuliaHub search) lots of packages use return_type, but (almost?) none use code_typed for this purpose.

palday commented 6 months ago

The change of return_type -> code_typed is orthogonal to that Tuple{} thing, and doesn't really reduce any dependency on "internals".

This is patently incorrect -- Core.Compiler.return_type is not documented in the manual and is therefore per definition internal.

I do agree that the Union{} / Tuple{} issue is orthogonal, but I suggested this change because the real issue was being dismissed prima facie because of the use of internals. So showing that the issue still arose when there was no use of internals was useful for that.

In terms of the workaround here for Union{}/Tuple{} -- I'm not strongly tied to it, but it seemed like a reasonable, straightforward change. If you're constructing an Index from an empty tuple, you do not know what the original element type is, so Any IMHO makes as much sense as Union{}. On the other hand if you're constructing from e.g. an empty Vector, then you still have the eltype information and so can construct an appropriate empty index. In terms of the type definition, tree, etc., it's fine to explicitly construct Index{Union{}}(), but that is an explicitly an empty index (because of both the type parameter and the lack of initialization values), which I view as a slightly different case than Index(iter) where iter just happens to be Tuple{}.

aplavin commented 6 months ago

This is patently incorrect

Sorry, maybe I formulated it too strongly :) It very slightly reduces the internals dependence, because the actual return value of code_typed is still not guaranteed to be stable and can change when inference changes in Julia. Relying on inference, no matter how exactly it is done, is relying on Julia internals.

This change from return_type to code_typed increases overhead a lot:

# Core return_type()
julia> @btime Core.Compiler.return_type(sin, Tuple{Float64})
  0.791 ns (0 allocations: 0 bytes)
Float64

# return_type() from this PR
julia> @btime return_type(sin, Tuple{Float64})
  48.333 μs (1232 allocations: 69.22 KiB)
Float64

Wouldn't be great to pay this overhead on every Dictionary/Indices construction, especially given that it doesn't remove the internals dependence. And indeed much more packages use return_type to probe inference than code_typed.

I suggested this change because the real issue was being dismissed prima facie because of the use of internals

The Tuple{Union{}} error in 1.10 is completely independent on internals, and is a breaking change anyway. It just happened that the first manifestations of this issue were found in code that relies on inference. Luckily, now there's no shortage of internals-free examples that demonstrate this issue (in https://github.com/JuliaLang/julia/issues/52385).

andyferris commented 6 months ago

Okay, this has really opened a can of worms :)

There are few things going on here.

  1. TypedTables is pretty dumb when you have zero columns - we should improve that logic in this package.
  2. We frequently use type inference to figure out the type parameters of a typed table, to make sure the return type is concrete (and downstream operations aren't penalized).
  3. Julia has both invariant and covariant type parameters. Handling the eltype when mapping an empty array using only automated type inference, for example, becomes quite tricky.
  4. Julia 1.10 throws an error on Tuple{Union{}}, where previous versions haven't.

And yes, this package relies on Julia internals. It always has - originally this was "a feature not a bug" because it gave great performance + ergonomics. Traditionally we update the package to work well with each new version of Julia (but this hasn't been necessary in recent years).

The plan was always to use something "recommended" once Julia became more mature. I actually have an minimal example for the logic we follow:

x = map(sin, Int64[])

What is the type (concrete, runtime type) of x? Is it inferred? How does Base determine the eltype of the returned array? For many years, it has been known that invoking type inference to figure out a type which then needs to get inferred might expose undefined behavior (I'm even not sure what well-defined nested inference looks like inside recursive loops...). However, when not run as a part of a recursive loop it does in practice seem to work well. In fact, for the example above Base still uses Core.Compiler.return_type to figure out the eltype must be Float64. I think core devs like Jameson are right to raise concern with this pattern, but I still believe Base needs to lead the way here before we can copy a working pattern.

In general, I feel like swapping Core.Compiler.return_type to code_typed ultimately targets a political problem, not a technical problem. In fact it may introduce some regressions here. The discussion at https://github.com/JuliaLang/julia/issues/52385 is progressing - but I think we need to get the attention of a few more devs that can do something in the space so they can design a solution amongst themselves.

We do however need to get this package working well with Julia 1.10, and any improvements to handling tables with zero rows or with zero columns are extremely valuable and welcome. Is there something more minimal we could try that will at least get us unstuck in the short term?

codecov[bot] commented 6 months ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (7b5b7b3) 79.35% compared to head (c83f300) 79.15%.

Files Patch % Lines
src/Dictionary.jl 0.00% 2 Missing :warning:
src/map.jl 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #127 +/- ## ========================================== - Coverage 79.35% 79.15% -0.20% ========================================== Files 21 21 Lines 2349 2351 +2 ========================================== - Hits 1864 1861 -3 - Misses 485 490 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aplavin commented 6 months ago

I actually have an minimal example for the logic we follow: x = map(sin, Int64[])

And Base itself doesn't hesitate to return a collection with Union{} eltype when it makes sense :)

julia> map(length, Symbol[])
Union{}[]

Also when making an array out of an empty tuple - the exact same scenario as discussed here with Indices:

julia> collect(())
Union{}[]

So the consistent way, following Base, is to continue returning Indices{Union{}}. And as @nsajko points out that this is the most accurate type to return. Would be weird to change it and make the result both less consistent and less accurate.

Would be nice if those affected by this 1.10 issue chimed in the https://github.com/JuliaLang/julia/issues/52385 discussion! I think popularity does affect the priority different bugs get (and that's totally sensible).

nsajko commented 6 months ago

I think popularity does affect the priority different bugs get (and that's totally sensible).

Note that the issue is labeled with triage, so I guess a decision might happen either way in less than 24 hours.

palday commented 6 months ago

It looks like the Tuple{Union{}} change will be reverted: https://github.com/JuliaLang/julia/issues/52385#issuecomment-1876217885

andyferris commented 6 months ago

Thanks for following up on this @palday