erlyaws / yaws

Yaws webserver
https://erlyaws.github.io
BSD 3-Clause "New" or "Revised" License
1.28k stars 267 forks source link

Support reproducible builds #446

Closed avtobiff closed 2 years ago

avtobiff commented 2 years ago

Currently two successive builds of YAWS does not generate the same binary output. Reproducible builds are important to verify that the given source code produces the same result. For further information see [0].

It seems that (at least) the following needs to be fixed in order to support reproducible builds:

Running two successive builds and using the diffoscope tool to check for differences yields the output in [1].

Reproduce by doing two successive builds, like so:

$ cd yaws

# first build
$ git clean -fdxq
$ autoreconf -fi
$ ./configure --prefix=$PWD/../yaws-rel/rb1
$ make -j
$ make install

# second build
$ git clean -fdxq
$ autoreconf -fi
$ ./configure --prefix=$PWD/../yaws-rel/rb2
$ make -j
$ make install

# check with diffoscope
$ cd ../yaws-rel
$ diffoscope rb1 rb2

[0] https://reproducible-builds.org/ [1] diffoscope.yaws.log

vinoski commented 2 years ago

Are these results with or without the recently-added YAWS_DETERMINISTIC_BUILD environment variable set?

vinoski commented 2 years ago

I see that with YAWS_DETERMINISTIC_BUILD set we get diffoscope hits like this in the yaws script:

 -yawsdir="/tmp/yrb1/lib/yaws"
 -vardir="/tmp/yrb1/var"
 +yawsdir="/tmp/yrb2/lib/yaws"
 +vardir="/tmp/yrb2/var"

But are these, and other similar diffs in yaws.conf and config files, really violations of reproducible builds? I see on the Reproducible Builds website that

By promising identical results are always generated from a given source, this allows multiple third parties to come to a consensus on a “correct” result, highlighting any deviations as suspect and worthy of scrutiny.

Focusing on the "from a given source" part of the above sentence: IMO configuring two builds each with a different install prefix means you have two different sources. A better check IMO would be to see the differences between two builds from the same configuration. There, I think the only differences we'd see are the datetimes embedded in the yaws.pdf and yaws.dvi files, for example:

 -CreationDate: "D:20220107104252-05'00'"
 +CreationDate: "D:20220107105033-05'00'"
  Creator: 'LaTeX with hyperref'
  Keywords: ''
 -ModDate: "D:20220107104252-05'00'"
 +ModDate: "D:20220107105033-05'00'"
avtobiff commented 2 years ago

Are these results with or without the recently-added YAWS_DETERMINISTIC_BUILD environment variable set?

These are the results without it set. I missed that this was pushed recently, even though it is current HEAD on master. Sorry.

Focusing on the "from a given source" part of the above sentence: IMO configuring two builds each with a different install prefix means you have two different sources. A better check IMO would be to see the differences between two builds from the same configuration.

I can agree with this. Using DESTDIR this will not be an issue.

I think the only differences we'd see are the datetimes embedded in the yaws.pdf (...)

The PR #447 documents reproducible builds for YAWS and also makes yaws.ps generation deterministic if SOURCE_DATE_EPOCH is set. Generating yaws.pdf is done with pdflatex which already uses SOURCE_DATE_EPOCH.

avtobiff commented 2 years ago

After further investigation this needs to be reopened.

Building in separate source directories uncovered some issues that breaks reproducible builds.

The following build method was used, copy a pristine yaws repo to another path, build, and install, like so:

$ export YAWS_DETERMINISTIC_BUILD=true
$ export SOURCE_DATE_EPOCH=$(date +%s)
$ cd yaws
$ git clean -fdxq
$ cd ..
$ cp -a yaws yaws2

# first build
$ cd yaws
$ autoreconf -fi
$ ./configure --prefix=/usr
$ make all doc apps
$ make DESTDIR=/tmp/yaws1 install

# second build
$ cd ../yaws2
$ autoreconf -fi
$ ./configure --prefix=/usr
$ make all doc apps
$ make DESTDIR=/tmp/yaws2 install

Running diffoscope /tmp/yaws1 /tmp/yaws2 uncovered that several things were not built deterministically:

The PR #448 solves these things.

However more work is needed.

The following files are not built deterministically, and I don't quite understand why not.

examples/ebin/advanced_echo_callback.beam
examples/ebin/authmod_gssapi.beam
examples/ebin/basic_echo_callback_extended.beam
examples/ebin/server_sent_events.beam

It might be something in the LocT, Local Functions, section of the BEAM file. More investigation is needed.

For details and status of reproducible builds of the yaws Debian package see the reproducible builds CI page. [0]

[0] https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/yaws.html

avtobiff commented 2 years ago

The following files are not built deterministically, and I don't quite understand why not.

examples/ebin/advanced_echo_callback.beam
examples/ebin/authmod_gssapi.beam
examples/ebin/basic_echo_callback_extended.beam
examples/ebin/server_sent_events.beam

These files differ because +debug_info is used.

Removing +debug_info when building the examples if YAWS_DETERMINISTIC_BUILD is set resolves it.