JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.45k stars 5.46k forks source link

Precompile should check staleness using a hash instead of a timestamp #17845

Closed tlnagy closed 8 years ago

tlnagy commented 8 years ago

As discussed in https://github.com/JuliaStats/StatsBase.jl/issues/202, alternating Pkg.add(X) and using X statements can lead to errors. According to @tkelman, this could be fixed by making require check staleness by looking at the timestamp instead of a hash.

Example:

$ rm -rf ~/.julia/v0.4
$ rm -rf ~/.julia/lib/v0.4
$ julia -e "Pkg.update()"
INFO: Initializing package repository /Users/tamasnagy/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
INFO: Updating METADATA...
INFO: Computing changes...
INFO: No packages to install, update or remove
$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |  x86_64-apple-darwin15.4.0

julia> Pkg.add("StatsBase")
INFO: Installing BinDeps v0.4.1
INFO: Installing Compat v0.8.6
INFO: Installing Rmath v0.1.2
INFO: Installing SHA v0.2.0
INFO: Installing StatsBase v0.9.0
INFO: Installing StatsFuns v0.3.0
INFO: Installing URIParser v0.1.5
INFO: Building Rmath
INFO: Package database updated

julia> using StatsBase
INFO: Precompiling module StatsBase...

julia> Pkg.add("Distributions")
INFO: Installing Calculus v0.1.15
INFO: Installing Distributions v0.10.0
INFO: Installing PDMats v0.4.2
INFO: Building Rmath
INFO: Package database updated

julia> using Distributions
INFO: Precompiling module Distributions...
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/Rmath.ji for module Rmath.
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/StatsFuns.ji for module StatsFuns.
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/StatsBase.ji for module StatsBase.
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/Distributions.ji for module Distributions.
WARNING: Module Rmath uuid did not match cache file
  This is likely because module Rmath does not support  precompilation but is imported by a module that does.
ERROR: __precompile__(true) but require failed to create a precompiled cache file
 in require at /usr/local/Cellar/julia/0.4.5/lib/julia/sys.dylib
tkelman commented 8 years ago

Pkg check staleness

staleness check happens in using (actually require, which is also used by import), not Pkg

tlnagy commented 8 years ago

Thanks! I updated the original post.

stevengj commented 8 years ago

I see, the problem here is that the timestamp has changed because Pkg.build("Rmath") regenerates its deps.jl file, but the code hasn't actually changed so the rebuild (which triggers #12508) is not actually needed.

As a workaround, @BinDeps.install could overwrite the deps.jl file only if it has changed.

tkelman commented 8 years ago

that was a workaround I suggested in the StatsBase issue, but I think this could happen for other reasons and might be good for making .ji files easier to redistribute, if the checksum is cheap enough. x-ref https://github.com/JuliaLang/julia/pull/12458#issuecomment-129859137

vchuravy commented 8 years ago

Making staleness depend on the hash would be great for users who work on distributed system (supercomputers and such). @andreasnoack and I have been running into problems sometimes.

stevengj commented 8 years ago

Here is a nice-looking hardware-accelerated BSD-licensed CRC32c implementation in C by Mark Adler: http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software/1

stevengj commented 8 years ago

And here is a benchmark of different CRC routines, concluding that Adler's routine is pretty much the fastest.

vtjnash commented 8 years ago

I've always been fairly against doing this with checksums. I agree there needs to be more development here, but I haven't been able to develop a complete proposal yet to address these and other similar issues.

stevengj commented 8 years ago

@vtjnash, doing if !timestamps_match && !checksums_match then regenerate_cache seems like a strict improvement over if !timestamps_match (if the time for the checksum is negligible, which seems likely), by reducing the chance of false positives. What is the nature of your objection?

stevengj commented 8 years ago

Should this be closed in light of the discussion at #18127?