cvogt / cbt

CBT - fun, fast, intuitive, compositional, statically checked builds written in Scala
Other
488 stars 60 forks source link

Make packages reproducible #543

Open jodersky opened 7 years ago

jodersky commented 7 years ago

Don't include time information in jars in order to make them reproducible.

This allows builds to be verified with hash sums.

Also see https://reproducible-builds.org/

cvogt commented 7 years ago

Nice!! A comment in the code linking to that URL may be nice. Will do it later when I am at a computer or if you want to be quicker ;).

jodersky commented 7 years ago

Will do! Since this is something that can easily be forgotten, I was also thinking of adding a test to CI to make sure that packages are reproducible.

cvogt commented 7 years ago

Sure. Nice!

On July 3, 2017 6:57:58 PM EDT, Jakob Odersky notifications@github.com wrote:

Will do! Since this is something that can easily be forgotten, I was also thinking of adding a test to CI to make sure that packages are reproducible.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/cvogt/cbt/pull/543#issuecomment-312746660

jodersky commented 7 years ago

Added a comment about the website and a test

jodersky commented 7 years ago

Linking to #502 as it is related

jvican commented 7 years ago

Have you tested this does not affect incremental compilation @jodersky ?

jodersky commented 7 years ago

Sorry for not addressing your feedback earlier, I'll try to get some time this weekend.

Regarding the incremental compilation, I haven't tested this, but I don't think that the compiler would change timestamps within packaged jars. If I install a pre-packaged jar in a read-only location (/usr/share/...), would that mean that incremental compilation of depending sources would break?

jvican commented 7 years ago

@jodersky To the best of my understanding, the JDK does indeed cache jars using that timestamp information. What I don't actually know is which information it uses. It might be the metadata of the whole jar, or of every jar entry.

cvogt commented 7 years ago

For what operations? Do you have a link?

On July 28, 2017 4:08:01 PM EDT, Jorge notifications@github.com wrote:

@jodersky To the best of my understanding, the JDK does indeed cache jars using that timestamp information. What I don't actually know is which information it uses. It might be the metadata of the whole jar, or of every jar entry.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/cvogt/cbt/pull/543#issuecomment-318750029

jvican commented 7 years ago

Classloading in general. Not saying it is an issue. It may be. https://github.com/scala/bug/issues/10295

jodersky commented 7 years ago

Hmm, I'm not quite sure how incremental compilation works here. Does it mean that to check if a source needs recompilation, it checks the last modification date of its dependencies and compares it with a local cache? In case it doesn't use any checksum it should be fairly easy to spoof; I'll give it a shot.

The changes proposed here do similar things as this maven plugin https://zlika.github.io/reproducible-build-maven-plugin/ (and also what happens when building a java debian package)

cvogt commented 7 years ago

Jorge, I think that's just scalac. I think the JDK just loads classes once if not loaded, no timestamp involved. But could be wrong.

On July 28, 2017 4:42:56 PM EDT, Jorge notifications@github.com wrote:

Classloading in general. Not saying it is an issue. It may be. https://github.com/scala/bug/issues/10295

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/cvogt/cbt/pull/543#issuecomment-318757184

jodersky commented 7 years ago

That's my understanding too, at least from briefly going over the linked issue.

In that case, how big of deal would this be? Usually, local dependencies use classfiles directly rather than the ones packaged in a jar (this is the export setting in sbt IIRC)

jvican commented 7 years ago

It doesn't look like something related to Scalac only:

My analysis of the internal caching in java.util.File was incorrect. Even prior to Java 9, the last modified timestamp is used as part of the cache key.

This is the JDK caching jars that are classloaded.

In any case, this could be an intricate use case that may not affect this particular change. We also wanted to experiment with this: https://github.com/sbt/io/pull/58. So please let me know if you find any issue in Zinc. I'll fix it.

jvican commented 7 years ago

Does it mean that to check if a source needs recompilation, it checks the last modification date of its dependencies and compares it with a local cache?

This doesn't happen for jars. Jars are sha1'd and then compared. See https://github.com/sbt/zinc/blob/d4d29e8ffeecfa7c6a8e834b4ee54c4bfda9af63/zinc/src/main/scala/sbt/internal/inc/MixedAnalyzingCompiler.scala#L184-L186. But please, do double-check in case I'm wrong.

jodersky commented 7 years ago

thanks for looking into this @jvican. I assume we can move forward with this change then?