ACCESS-NRI / ACCESS-OM2

ACCESS-OM2: ACCESS Ocean-Sea Ice Model
Apache License 2.0
5 stars 0 forks source link

Deterministic / Reproducible Builds #8

Open harshula opened 1 year ago

harshula commented 1 year ago

The goal is to build identical binaries from identical source code. This involves not embedding unnecessary metadata. e.g. Timestamps. Initial testing involving build infrastructure changes have demonstrated that the builds are not deterministic / reproducible.

Explanation of Deterministic / Reproducible Builds: https://reproducible-builds.org/ https://wiki.debian.org/ReproducibleBuilds

harshula commented 1 year ago

For example:

Using cmake to build the library we see ar and ranlib execution:

/bin/ar qc lib/libdatetime.a CMakeFiles/datetime.dir/src/datetime_module.f90.o
/bin/ranlib lib/libdatetime.a

This results in timestamps in the library archive:

$ diff -ru libdatetime.a.3.od libdatetime.a.4.od

--- libdatetime.a.3.od  2022-09-26 23:19:07.926041000 +1000
+++ libdatetime.a.4.od  2022-09-26 23:19:14.736031000 +1000
@@ -1,6 +1,6 @@
 000000 21 3c 61 72 63 68 3e 0a 2f 20 20 20 20 20 20 20  >!<arch>./       <
-000010 20 20 20 20 20 20 20 20 31 36 36 33 38 39 36 39  >        16638969<
-000020 31 38 20 20 30 20 20 20 20 20 30 20 20 20 20 20  >18  0     0     <
+000010 20 20 20 20 20 20 20 20 31 36 36 34 31 39 38 30  >        16641980<
+000020 32 34 20 20 30 20 20 20 20 20 30 20 20 20 20 20  >24  0     0     <
 000030 30 20 20 20 20 20 20 20 33 32 34 36 20 20 20 20  >0       3246    <
 000040 20 20 60 0a 00 00 00 4e 00 00 0d 46 00 00 0d 46  >  `....N...F...F<
 000050 00 00 0d 46 00 00 0d 46 00 00 0d 46 00 00 0d 46  >...F...F...F...F<
@@ -211,7 +211,7 @@
 000d20 20 20 32 34 20 20 20 20 20 20 20 20 60 0a 64 61  >  24        `.da<
 000d30 74 65 74 69 6d 65 5f 6d 6f 64 75 6c 65 2e 66 39  >tetime_module.f9<
 000d40 30 2e 6f 2f 0a 0a 2f 30 20 20 20 20 20 20 20 20  >0.o/../0        <
-000d50 20 20 20 20 20 20 31 36 36 33 38 39 36 39 31 38  >      1663896918<
+000d50 20 20 20 20 20 20 31 36 36 34 31 39 38 30 32 34  >      1664198024<
 000d60 20 20 31 38 39 30 31 20 38 36 34 31 20 20 31 30  >  18901 8641  10<
 000d70 30 36 34 34 20 20 33 35 39 33 36 20 20 20 20 20  >0644  35936     <
 000d80 60 0a 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00  >`..ELF..........<

Interestingly, the library has a parallel autotools build system that executes ar but not ranlib:

ar ruv libdatetime.a $(OBJS)

man ar(1)

       r   Insert the files member... into archive (with replacement). This
           operation differs from q in that any previously existing members
           are deleted if their names match those being added.

This library's CMake and autotools systems appear to be out of sync.

harshula commented 1 year ago

Using the autotools system, we can try to remove the timestamp:

        ar qcD libdatetime.a $(OBJS)
        ranlib -D libdatetime.a

man ar(1)

       D   Operate in deterministic mode.  When adding files and the archive
           index use zero for UIDs, GIDs, timestamps, and use consistent file
           modes for all files.  When this option is used, if ar is used with
           identical options and identical input files, multiple runs will
           create identical output files regardless of the input files'
           owners, groups, file modes, or modification times.

           If binutils was configured with --enable-deterministic-archives,
           then this mode is on by default.  It can be disabled with the U
           modifier, below.

That executes as:

ar qcD libdatetime.a datetime_module.o
ranlib -D libdatetime.a

Comparing two binaries built by autotools, we see that timestamp can be removed:

@@ -1,6 +1,6 @@
 000000 21 3c 61 72 63 68 3e 0a 2f 20 20 20 20 20 20 20  >!<arch>./       <
-000010 20 20 20 20 20 20 20 20 31 36 36 34 32 30 32 38  >        16642028<
-000020 36 33 20 20 30 20 20 20 20 20 30 20 20 20 20 20  >63  0     0     <
+000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20  >        0       <
+000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20  >    0     0     <
 000030 30 20 20 20 20 20 20 20 33 32 34 36 20 20 20 20  >0       3246    <
 000040 20 20 60 0a 00 00 00 4e 00 00 0d 42 00 00 0d 42  >  `....N...B...B<
 000050 00 00 0d 42 00 00 0d 42 00 00 0d 42 00 00 0d 42  >...B...B...B...B<
@@ -211,8 +211,8 @@
 000d20 20 20 32 30 20 20 20 20 20 20 20 20 60 0a 64 61  >  20        `.da<
 000d30 74 65 74 69 6d 65 5f 6d 6f 64 75 6c 65 2e 6f 2f  >tetime_module.o/<
 000d40 0a 0a 2f 30 20 20 20 20 20 20 20 20 20 20 20 20  >../0            <
-000d50 20 20 31 36 36 34 32 30 32 38 36 33 20 20 31 38  >  1664202863  18<
-000d60 39 30 31 20 38 36 34 31 20 20 31 30 30 36 34 34  >901 8641  100644<
+000d50 20 20 30 20 20 20 20 20 20 20 20 20 20 20 30 20  >  0           0 <
+000d60 20 20 20 20 30 20 20 20 20 20 36 34 34 20 20 20  >    0     644   <
 000d70 20 20 34 35 35 32 38 20 20 20 20 20 60 0a 7f 45  >  45528     `..E<
 000d80 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 01 00  >LF..............<
 000d90 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00  >>...............<
harshula commented 1 year ago

For CMake, we can do:

SET(CMAKE_Fortran_ARCHIVE_CREATE "<CMAKE_AR> qcD <TARGET> <LINK_FLAGS> <OBJECTS>")
SET(CMAKE_Fortran_ARCHIVE_FINISH "<CMAKE_RANLIB> -D <TARGET>")

$ diff -ru libdatetime.a.4.od libdatetime.a.9.od

--- libdatetime.a.4.od  2022-09-26 23:19:14.736031000 +1000
+++ libdatetime.a.9.od  2022-09-28 00:28:35.086702000 +1000
@@ -1,6 +1,6 @@
 000000 21 3c 61 72 63 68 3e 0a 2f 20 20 20 20 20 20 20  >!<arch>./       <
-000010 20 20 20 20 20 20 20 20 31 36 36 34 31 39 38 30  >        16641980<
-000020 32 34 20 20 30 20 20 20 20 20 30 20 20 20 20 20  >24  0     0     <
+000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20  >        0       <
+000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20  >    0     0     <
 000030 30 20 20 20 20 20 20 20 33 32 34 36 20 20 20 20  >0       3246    <
 000040 20 20 60 0a 00 00 00 4e 00 00 0d 46 00 00 0d 46  >  `....N...F...F<
 000050 00 00 0d 46 00 00 0d 46 00 00 0d 46 00 00 0d 46  >...F...F...F...F<
@@ -211,9 +211,9 @@
 000d20 20 20 32 34 20 20 20 20 20 20 20 20 60 0a 64 61  >  24        `.da<
 000d30 74 65 74 69 6d 65 5f 6d 6f 64 75 6c 65 2e 66 39  >tetime_module.f9<
 000d40 30 2e 6f 2f 0a 0a 2f 30 20 20 20 20 20 20 20 20  >0.o/../0        <
-000d50 20 20 20 20 20 20 31 36 36 34 31 39 38 30 32 34  >      1664198024<
-000d60 20 20 31 38 39 30 31 20 38 36 34 31 20 20 31 30  >  18901 8641  10<
-000d70 30 36 34 34 20 20 33 35 39 33 36 20 20 20 20 20  >0644  35936     <
+000d50 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20  >      0         <
+000d60 20 20 30 20 20 20 20 20 30 20 20 20 20 20 36 34  >  0     0     64<
+000d70 34 20 20 20 20 20 33 35 39 33 36 20 20 20 20 20  >4     35936     <
 000d80 60 0a 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00  >`..ELF..........<
 000d90 00 00 01 00 3e 00 01 00 00 00 00 00 00 00 00 00  >....>...........<
 000da0 00 00 00 00 00 00 00 00 00 00 a0 87 00 00 00 00  >................<
harshula commented 1 year ago

CMake source code (https://github.com/Kitware/CMake.git) contains Modules/CMakeFortranInformation.cmake:

# Create a static archive incrementally for large object file counts.
# If CMAKE_Fortran_CREATE_STATIC_LIBRARY is set it will override these.
if(NOT DEFINED CMAKE_Fortran_ARCHIVE_CREATE)
  set(CMAKE_Fortran_ARCHIVE_CREATE "<CMAKE_AR> qc <TARGET> <LINK_FLAGS> <OBJECTS>")
endif()
if(NOT DEFINED CMAKE_Fortran_ARCHIVE_APPEND)
  set(CMAKE_Fortran_ARCHIVE_APPEND "<CMAKE_AR> q <TARGET> <LINK_FLAGS> <OBJECTS>")
endif()
if(NOT DEFINED CMAKE_Fortran_ARCHIVE_FINISH)
  set(CMAKE_Fortran_ARCHIVE_FINISH "<CMAKE_RANLIB> <TARGET>")
endif()
harshula commented 1 year ago

The variable name is:

CMAKE_${lang}_ARCHIVE_(CREATE|APPEND|FINISH)

where, ${lang} is C, CUDA, CXX, Fortran, HIP, ISPC, OBJC, OBJCXX or Swift.

$ grep -l _ARCHIVE_CREATE Modules/CMake*Information.cmake
Modules/CMakeCInformation.cmake
Modules/CMakeCUDAInformation.cmake
Modules/CMakeCXXInformation.cmake
Modules/CMakeFortranInformation.cmake
Modules/CMakeHIPInformation.cmake
Modules/CMakeISPCInformation.cmake
Modules/CMakeOBJCInformation.cmake
Modules/CMakeOBJCXXInformation.cmake
Modules/CMakeSwiftInformation.cmake
harshula commented 1 year ago

A very detailed explanation, with historical context, of ar and ranlib: https://stackoverflow.com/questions/47910759/what-is-the-difference-between-ranlib-ar-and-ld-for-making-libraries

aidanheerdegen commented 1 year ago

Fascinating.

This has been lodged as an issue with Kitware (CMake developers) but there is no obvious move on their part

https://gitlab.kitware.com/cmake/cmake/-/issues/19852

As noted in this SO post, your solution above can be added to the CMake build:

set(CMAKE_Fortran_ARCHIVE_CREATE "<CMAKE_AR> qcD <TARGET> <LINK_FLAGS> <OBJECTS>")
set(CMAKE_Fortran_ARCHIVE_FINISH "<CMAKE_RANLIB> -D <TARGET>")

I would suggest this be raised as an issue with datetime-fortran and offer a PR to make the behaviour consistent between CMake and autotools.

harshula commented 1 year ago

This library's CMake and autotools systems appear to be out of sync.

Opened: https://github.com/wavebitscientific/datetime-fortran/issues/76

harshula commented 1 year ago

Notes

https://gitlab.kitware.com/cmake/cmake/-/issues/19852

My current workaround is to set CMAKE__CREATE_STATIC_LIBRARY on macOS platforms to something along the lines of /usr/bin/xcrun libtool -static -D -o , which seems to have the same effect, but it would be preferable if CMake had native libtool support instead of relying on xcrun.

&

You can also try setting ZERO_AR_DATE=1 in the environment. This is supported by older apple tools. It doesn't zero out the uid etc. in the .a, but it's better than nothing.

harshula commented 1 year ago

[Updated 30/08/2023]