NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.06k forks source link

Tracking issue: non-deterministic java apps #278518

Open TomaSajt opened 10 months ago

TomaSajt commented 10 months ago

There are several ways to package a Java app inside Nixpkgs, and most of those package the final app into a .jar file. However, .jar files are not reproducible/deterministic by default.

How to check for reproducibility?

Why are .jar files not deterministic?

When creating a .jar file Java will place a META-INF/MANIFEST.MF file inside the archive which will have the current date as its creation date, so it's non-deterministic.

There also might be a problem with the fact that changing a .properties file during the build process will put the current timestamp inside the file as a comment. These can usually be patched out without too much of a hassle.

One could just download the pre-built .jar-s and not worry about this, but I think it's important to build stuff from source.


Here are some ways java packages/apps are built inside Nixpkgs:

Possible solutions to achieve determinism

There was a setup-hook called canonicalize-jars-hook which aimed to solve this problem by unwrapping and rewrapping jars with the creation dates set to a fixed time. However, this process had some flaws

A after https://github.com/NixOS/nixpkgs/pull/296549, canonicalize-jars-hook lives on as stripJavaArchivesHook, using strip-nondeterminism as the backend.

My previous attempts at fixing this:

If we don't want to or can't use stripJavaArchivesHook for some reason, we could use the tools provided by the build systems:

Apache Ant

You can set modificationtime for the <jar> (and possibly <war>) tasks, usually in the build.xml files Like this: <jar modificationtime="0" ...>...</jar> To automate this you could use a script like this inside postPatch

# Fix jar timestamps for reproducibility
substituteInPlace build.xml \
    --replace-fail '<jar ' '<jar modificationtime="0" '

(Note, that this also matches a space character after the jar word, so that it doesn't accidentally match other tasks starting with jar, though there's not likely to be one anyway)

You could also use a dedicated xml modification tool like xmlstarlet

# Fix jar timestamps for reproducibility
xmlstarlet ed -L -a "//jar" -t attr -n "modificationtime" -v "0" build.xml

I tried to create setup-hook which does this automatically: https://github.com/NixOS/nixpkgs/pull/294516 However this would only work for ant projects

Maven

You can set the project.build.outputTimestamp property

You can do this by adding -Dproject.build.outputTimestamp=1980-01-01T00:00:02Z to the args given to the mvn command

or by patching the pom.xml file to have this:

<project>
  ...
  <properties>
    ...
    <project.build.outputTimestamp>1980-01-01T00:00:02Z</project.build.outputTimestamp>
  </properties>
  ...
</project>

Gradle

If the project uses Groovy, you can add the following lines to build.gradle

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}

or if the project uses Kotlin you can add the following to build.gradle.kts

tasks.withType<AbstractArchiveTask> {
    isPreserveFileTimestamps = false
    isReproducibleFileOrder = true
}

Without a build system

If the packaging script of an app uses the jar command directly, you could use the --date flag to specify a build date. Note, this flag does not exist on versions before jdk17.

Example:

jar --date="1980-01-01T00:00:02Z" --create Program.class > test.jar

Though I'd say that using stripJavaArchivesHook is your best bet, because it should work with any build tool


Progress with making everything deterministic:

was already deterministic before opening this issue

Fixed

attempted, seems difficult

could be done

not attempted

fgaz commented 10 months ago

Maybe this hook should be added to the java packaging docs

yes please!

de11n commented 9 months ago

Excellent work.

TomaSajt commented 8 months ago

I was looking around on nixpkgs and it looks like ant actually does have a way to set the modification-time for the created jars. Here's the only example that's inside nixpkgs: https://github.com/NixOS/nixpkgs/blob/daafb1238d6ce860e5318945979c8f2e25da31c8/pkgs/by-name/jo/jogl/package.nix#L80-L82 It's adding a fixed modificationtime attribute to the jar task. Docs of the jar ant task: https://ant.apache.org/manual/Tasks/jar.html

After some more looking, it turns out that even the jar program itself has the --date argument.

jar --date="1980-01-01T00:00:02Z" --create Program.class > test.jar

This will create a jar file with all files inside having the set timestamp. Though this is not in the older versions of java. The first version to have this inside nixpkgs is jdk17. Not sure exactly which version added it because it is heavily underdocumented, but I found which commit added it: https://github.com/openjdk/jdk/commit/db68a0ce1ce152345320e70acb7e9842d2f1ece4

After some more searching, it looks like gradle also has something like this built in:

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}

So, in conclusion, it looks like every major way of packaging has a built-in solution for fixed timestamps. I am a bit sad that I did not look into this more thoroughly earlier. Still, I believe that having a hook, which works in all cases has its own merits, as it will allow us to avoid having to patch every java app.

TomaSajt commented 7 months ago

~I decided that using the built-in method is a better solution, so I opened https://github.com/NixOS/nixpkgs/pull/294516, which moves away from canonicalize-jars-hook~

I went with the general solution instead anyways

de11n commented 7 months ago

It might also be helpful to have a tool to detect common pitfalls in making Java packages deterministic. For example, we could find all jar files in the outputs and inspect them for timestamps that suggest non-determinism. We could then cause that build to fail unless some sort of "allowNondetermism = true" flag is set. This could be a setuphook but at some point we may want a buildJavaPackage builder that just applies common-sense things like this and can be easily documented.

TomaSajt commented 7 months ago

A new thing I discovered was the existance of .jmod files and they can also have some non-determinism. They are not just plain .jar files, they are a differnet format. strip-nondeterminism supports patching these files. AFAICT they are present inside every jdk since jdk9, but those seem to be mostly deterministic (though I only checked the latest jdk)