JuliaData / NamedTuples.jl

[DEPRECATED] NamedTuples.jl
Other
30 stars 17 forks source link

Fix incremental compilation #61

Closed omus closed 6 years ago

omus commented 6 years ago

The DataStreams package was running into issues with incremental pre-compilation with the NamedTuples package:

julia> Base.compilecache("DataStreams")
INFO: Recompiling stale cache file /Users/omus/.julia/lib/v0.6/DataStreams.ji for module DataStreams.
WARNING: eval from module NamedTuples to Data:
Expr(:type, false, Expr(:<:, Expr(:curly, :_NT_col, :T1)::Any, NamedTuples.NamedTuple)::Any, Expr(:block, Expr(:::, :col, :T1)::Any, Expr(:tuple)::Any)::Any)::Any
  ** incremental compilation may be broken for this module **

WARNING: eval from module NamedTuples to Data:
Expr(:type, false, Expr(:<:, Expr(:curly, :_NT_name_compute_computeargs, :T1, :T2, :T3)::Any, NamedTuples.NamedTuple)::Any, Expr(:block, Expr(:::, :name, :T1)::Any, Expr(:::, :compute, :T2)::Any, Expr(:::, :computeargs, :T3)::Any, Expr(:tuple)::Any)::Any)::Any
  ** incremental compilation may be broken for this module **

These warnings occurred because the current implementation of @NT performs an eval during macro expansion. In this PR the @NT macro has been revised to only perform the eval at runtime which makes incremental compilation work. To do this I needed to break apart the make_tuple function into two parts: initial expression parsing and type creation later. In order to make this change completely non-breaking I have left make_tuple as is and placed the initial expression parsing components directly in the @NT macro and have put the runtime type creation part of the NT function.

These changes should properly address the change that @quinnj wanted as part of https://github.com/JuliaData/NamedTuples.jl/pull/46 without generating duplicate types and should allow the DataStreams package to no longer require a custom fork of NamedTuples.jl

omus commented 6 years ago

I'm planning on merging this in about 4 hours. After that I'll make a new minor release.

omus commented 6 years ago

@davidanthoff do you have an concerns here?

quinnj commented 6 years ago

The 0.7 error on travis looks concerning?

omus commented 6 years ago

The 0.7 error on travis looks concerning?

The current v4.0 releases has no compatibility with Julia 0.7 as expected. Support will be added in v5.0 with changes such as: https://github.com/JuliaData/NamedTuples.jl/pull/55

davidanthoff commented 6 years ago

Yes, I have a major concern: this completely kills performance. The whole clue of this package was from the beginning that constructing a named tuple has exactly the same performance as constructing a struct. This PR changes that. Here is what I get pre and post this change:

julia> @benchmark @NT(a=2, b=4)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.099 ns (0.00% GC)
  median time:      6.301 ns (0.00% GC)
  mean time:        6.928 ns (0.00% GC)
  maximum time:     30.500 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

and post:

julia> @benchmark @NT(a=2, b=4)                
BenchmarkTools.Trial:                          
  memory estimate:  1.53 KiB                   
  allocs estimate:  32                         
  --------------                               
  minimum time:     1.280 μs (0.00% GC)        
  median time:      1.450 μs (0.00% GC)        
  mean time:        1.664 μs (7.81% GC)        
  maximum time:     211.070 μs (96.65% GC)     
  --------------                               
  samples:          10000                      
  evals/sample:     10                         

So that makes construction about a factor 200 slower... So that version certainly won't fly for my family of packages. Can you please revert this change?

I also want to stress again that I think we need to find a different way of governing a package like this. I don't think that major changes like this can be merged without a single review from anyone, a day after the PR was opened. This package powers too many other things, I don't think any single party should make changes here quickly and unilaterally. Just speaking for myself, there is no way that I can generally react to review requests on such short notice, so I think the main thing is that this needs to slow down.

omus commented 6 years ago

I'll look into deeper into the performance issues tomorrow. From a cursory performance check by moving the type name generation to macro expansion time we see a major improvement in performance:

julia> @benchmark @NT(a=2, b=4)
BenchmarkTools.Trial:
  memory estimate:  224 bytes
  allocs estimate:  3
  --------------
  minimum time:     296.749 ns (0.00% GC)
  median time:      303.024 ns (0.00% GC)
  mean time:        366.931 ns (4.38% GC)
  maximum time:     8.885 μs (91.81% GC)
  --------------
  samples:          10000
  evals/sample:     255

We definitely need to resolve the incremental compilation issue so I'll leave this change in master for now.

davidanthoff commented 6 years ago

I'll leave this change in master for now.

To be frank, I find that unacceptable. You can't just merge things here without reviews that amount to major regressions for downstream packages. The right thing for you to do at this point is to revert this PR, and in the future wait for reviews and consensus from the other stakeholders before you merge things. I really appreciate that you are spending time on this package, but please do it in a way that is respectful of the existing user base, right now this feels a little bit like a take-over.

On the substantive issue: can we first try to understand why there is a precompile issue with DataStreams.jl, but not with any of the many other packages that use NamedTuples.jl and precompile just fine (all of the Queryverse, IndexedTables.jl, JuliaDB.jl etc.).

omus commented 6 years ago

I apologize about merging this before you got the chance to review this change. I'll try to wait long before merging significant changes. As for reverting I'm only holding off on reverting this change as I think this is something that we need to change so I'm only trying to avoid commit churn since this was already merged. If I had tagged this I would definitely revert and make a new tag.

Do you still want me to revert the change right away or are you okay with waiting to avoid commit churn?

davidanthoff commented 6 years ago

Thanks, and reading my post, it could have been phrased nicer, so sorry for that! Glad we are on the same page!

I just now tracked down a bug that was breaking a lot of tests on my end to an error that was introduced in v4.0.1 (something I reviewed but didn't catch). Ideally we would fix that bug soon and revert that change that was introduced in v4.0.0, I just opened a PR for that: https://github.com/JuliaData/NamedTuples.jl/pull/62. I think that would essentially require that you first revert this PR here, though? Or we create a release branch for this bug fix? But that seems even more work?

I am quite skeptical that the strategy in this PR will work: a lot of the downstream packages use NamedTuples.jl because there is zero overhead at runtime to instantiating a named tuple (relative to a tuple or struct), and I don't see how that can be done without having the type creation in the macro phase. We had a long search around that limitation about two years ago or so (for a different reason, we had hoped one could use the @NT macro in a generated function, but that also doesn't work because of the eval in the macro), but didn't find a workaround.

I still think it would be good to understand why precompile failes in DataStreams.jl but not in any of the other packages that use NamedTuples.jl?

omus commented 6 years ago

I'll revert this as there are some waiting bugfixes. There are some alternatives to doing this but this will now be the quickest.