codership / galera

Synchronous multi-master replication library
GNU General Public License v2.0
451 stars 176 forks source link

Galera builds no longer always reproducible on Debian reprotest #593

Open ottok opened 3 years ago

ottok commented 3 years ago

I noticed earlier this year that the Salsa-CI 'reprotest' started failing for Galera 4 in Debian and marked the test ignored in https://salsa.debian.org/mariadb-team/galera-4/-/commit/a801dd2654d67910bc181c7c1e7c7ce4dc0a002e

See example: https://salsa.debian.org/mariadb-team/galera-4/-/jobs/1413947 image

This is however to some degree random, and occasional builds pass, see example: https://salsa.debian.org/mariadb-team/galera-4/-/jobs/1493612

Now when working on Galera 3 updates the 'reprotest' there also regressed: https://salsa.debian.org/mariadb-team/galera-3/-/jobs/1659202

Since the issue is not going away by itself, I decided to file this bug report to get tips on what might be causing this.

Looking at the official Debian reproducible runs, the amd64 build in Debian unstable for 26.4.7 is fine, but the 2nd build in Bullseye fails on test failures: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/galera-4.html

image

For Galera 3 (which of latest is not yet uploaded) the view is so far all sunny: image

temeo commented 3 years ago

Looking at the official Debian reproducible runs, the amd64 build in Debian unstable for 26.4.7 is fine, but the 2nd build in Bullseye fails on test failures: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/galera-4.html

This test failure looks clear, there is an unit test which tries to use multicast socket which is probably not allowed by the build environment. Looks the same as #595.

For other failures, I need to first study what reprotest is trying to achieve.

ottok commented 3 years ago

I have uploaded latest Galera 4.9 to Debian and it still fails to build in the reproducible builds. The i386 build works now, but fails to reproduce as one build has an extra gu_crc32c_hardware symbol in garbd: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/diffoscope-results/galera-4.html

The armhf build is reproducible while amd64 and arm64 fail to even build.

As a general debugging excercise I ran the atomic builds on Salsa: https://salsa.debian.org/mariadb-team/galera-4/-/pipelines/271343 There the reprotest-kernel build failed. Apparently the build results depend on the kernel used?

To fix these a lot of detective work is needed. It should be possible though, as galera-3 was and still builds reproducible in Debian and these issues are only for galera-4.

ayurchen commented 3 years ago

Otto,

Where can we see exact build command lines? And what is the difference between the build environments? Galera does include some kernel headers for AES.

One thing I noticed that cmake builds do some strange thing and scons builds fail after them (require make clean):

/home/ayurchen/codership/galera/bugs.orig/galerautils/tests/gu_crc32c_test.c:137: undefined reference to `gu_crc32c_x86_64'
/usr/bin/ld: galerautils/src/libgalerautils.a(gu_crc32c.c.o): in function `crc32c_best_algorithm':
/home/ayurchen/codership/galera/bugs.orig/galerautils/src/gu_crc32c.c:192: undefined reference to `gu_crc32c_hardware'
collect2: error: ld returned 1 exit status

Could switching from scons to cmake have started to cause those failures?

ottok commented 3 years ago

Where can we see exact build command lines?

For Salsa-CI runs, you can see the exact build commands when you click on the "raw log" icon:

That leads to https://salsa.debian.org/mariadb-team/galera-4/-/jobs/1781054/raw where you can see:

cd obj-x86_64-linux-gnu && cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=None -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_RUNSTATEDIR=/run "-GUnix Makefiles" -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_INSTALL_LIBDIR=lib/x86_64-linux-gnu ..

I think the build command is always the same. It is just the environment that has slight changes.

Both Salsa-CI and the logs at https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/galera-3.html adn https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/galera-4.html are viewable and verbose, you should be able to see the details with a few clicks.

Here is a direct link to the log you might want to see: https://tests.reproducible-builds.org/debian/logdiffs/unstable/i386/galera-3_25.3.33-1.diff.gz

Could switching from scons to cmake have started to cause those failures?

Yes, the i386 failure and that gu_crc32c_hardware might be related to the SCons -> CMake change, as it is also visible for Galera 3 version 25.3.33-1 at https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/diffoscope-results/galera-3.html while the previous version 25.3.31-1 (last one to still use SCons passed).

The reason Galera 3 'reprotest' regressed in https://salsa.debian.org/mariadb-team/galera-3/-/jobs/1659202 might be related to the SCons to CMake change too, as it regressed in the same way for both Galera 3 & 4.

But despite the i386 failure and Salsa-CI reprotest failing, Galera 3 does build in a reproducible way on the official reproducibility builders for other archs: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/galera-3.html image

ottok commented 3 years ago

I asked help from mailing list and got this guidance:

Hi Otto,

I would also like to request help with Galera 4, which fails to build in a reproducible way on i386 only: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/diffoscope-results/galera-4.html

This isn't, as far as I can tell, due to a build path issue. Rather, I believe this occurs because the build system is looking for CRC hardware support and is using different compiler flags as a result. This is because we (deliberately) build with both an 32-bit and 64-bit kernel, even on the Debian i386 architecture.

You can confirm this in the build logs:

  • Checking for hardware CRC support for x86_64
  • Hardwared CRC support enabled: -msse4.2
  • Checking for hardware CRC support for i686
  • No hardware CRC support

You should probably 'just' need to instruct the Debian build to not perform this autodetection... or to simply ignore the result of it. But you probably only want to do this on Debian i386 only — otherwise, you prevent the hardware support from being activated elsewhere.

Looking at the logic of crc32c.cmake, I'm not entirely sure that passing, for example, -DGU_CRC32C_NO_HARDWARE from debian/rules will prevent -msse4.2 being added to the compiler... but I'm sure that you'll be able to work that bit out, especially with upstream's involvement.

Hope that helps.

Regards,

Chris Lamb

https://alioth-lists.debian.net/pipermail/reproducible-builds/Week-of-Mon-20210802/013118.html