binaryeq / jcompile

scripts to compile Java projects with different compilers to create a data set of comparable binaries
Apache License 2.0
0 stars 0 forks source link

Script to extract normalised summary of a jar's .class files #91

Closed wtwhite closed 4 months ago

wtwhite commented 4 months ago

Looking for build nondeterminism in runs 32, 33 and 34

Runs 33 and 34 contain identical .class files in every jar. Only jar metadata (timestamps and/or file order) differs between them:

wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/34_with_eq_from_dot_and_new_jep181 -name '*.jar'`; do c=${j/runs/crcs}.crcs; echo $c; mkdir -p `dirname $c`; ./classes_and_their_crcs_in_jar.sh $j > $c; done > crcs/make_crcs.34_with_eq_from_dot_and_new_jep181.log

real    0m14.611s
user    0m15.561s
sys 0m4.651s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/33_with_oracle_and_nodebug -name '*.jar'`; do c=${j/runs/crcs}.crcs; echo $c; mkdir -p `dirname $c`; ./classes_and_their_crcs_in_jar.sh $j > $c; done > crcs/make_crcs.33_with_oracle_and_nodebug.log

real    0m14.774s
user    0m15.730s
sys 0m4.521s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/32_run_31_with_correct_project_names -name '*.jar'`; do c=${j/runs/crcs}.crcs; echo $c; mkdir -p `dirname $c`; ./classes_and_their_crcs_in_jar.sh $j > $c; done > crcs/make_crcs.32_run_31_with_correct_project_names.log

real    0m11.632s
user    0m11.288s
sys 0m3.228s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ cd crcs/
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ ls -tlra
total 1372
drwxrwxr-x  3 wtwhite wtwhite   4096 Jun 27 08:32 34_with_eq_from_dot_and_new_jep181
-rw-rw-r--  1 wtwhite wtwhite 245373 Jun 27 08:33 make_crcs.34_with_eq_from_dot_and_new_jep181.log
drwxrwxr-x  3 wtwhite wtwhite   4096 Jun 27 08:34 33_with_oracle_and_nodebug
-rw-rw-r--  1 wtwhite wtwhite 225437 Jun 27 08:34 make_crcs.33_with_oracle_and_nodebug.log
drwxrwxr-x  3 wtwhite wtwhite   4096 Jun 27 08:36 32_run_31_with_correct_project_names
-rw-rw-r--  1 wtwhite wtwhite 185973 Jun 27 08:36 make_crcs.32_run_31_with_correct_project_names.log
drwxrwxr-x 27 wtwhite wtwhite 720896 Jun 27 08:36 ..
drwxrwxr-x  5 wtwhite wtwhite   4096 Jun 27 08:36 .
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ for d in */; do echo $d; ( cd $d && md5sum `find jars -name '*.jar.crcs'` > all_md5s.txt ); done
32_run_31_with_correct_project_names/
33_with_oracle_and_nodebug/
34_with_eq_from_dot_and_new_jep181/
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ ls -ltra */all_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 171125 Jun 27 08:38 32_run_31_with_correct_project_names/all_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 230421 Jun 27 08:38 33_with_oracle_and_nodebug/all_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 230421 Jun 27 08:38 34_with_eq_from_dot_and_new_jep181/all_md5s.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ md5sum 33_with_oracle_and_nodebug/all_md5s.txt 
26bd251a1a2abe07d2dbfea8755195b3  33_with_oracle_and_nodebug/all_md5s.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ md5sum 34_with_eq_from_dot_and_new_jep181/all_md5s.txt 
26bd251a1a2abe07d2dbfea8755195b3  34_with_eq_from_dot_and_new_jep181/all_md5s.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ md5sum 32_run_31_with_correct_project_names/all_md5s.txt 
af8c70ff973edfbc9e9ddfaebbf23a8d  32_run_31_with_correct_project_names/all_md5s.txt

And on the 1856 jars shared between run 32 and both later runs, the classes are also identical:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ wc -l */all_md5s.txt
  1856 32_run_31_with_correct_project_names/all_md5s.txt
  2492 33_with_oracle_and_nodebug/all_md5s.txt
  2492 34_with_eq_from_dot_and_new_jep181/all_md5s.txt
  6840 total
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ comm -1 -2 <(cut -c35- 34_with_eq_from_dot_and_new_jep181/all_md5s.txt|sort) <(cut -c35- 32_run_31_with_correct_project_names/all_md5s.txt|sort) > jars_in_common.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ wc -l jars_in_common.txt 
1856 jars_in_common.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ ( cd 32_run_31_with_correct_project_names/ && md5sum `cat ../jars_in_common.txt` > jars_in_common_md5s.txt )
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ ( cd 34_with_eq_from_dot_and_new_jep181/ && md5sum `cat ../jars_in_common.txt` > jars_in_common_md5s.txt )
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ ls -ltra */jars_in_common_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 171125 Jun 27 08:48 32_run_31_with_correct_project_names/jars_in_common_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 171125 Jun 27 08:49 34_with_eq_from_dot_and_new_jep181/jars_in_common_md5s.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ md5sum !$
md5sum */jars_in_common_md5s.txt
33b473583e64c6a47e1372db2cae7060  32_run_31_with_correct_project_names/jars_in_common_md5s.txt
33b473583e64c6a47e1372db2cae7060  34_with_eq_from_dot_and_new_jep181/jars_in_common_md5s.txt
wtwhite commented 4 months ago

There are however differences in jar file order, even between runs 33 and 34 which have identical .class file contents:

wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/32_run_31_with_correct_project_names -name '*.jar'`; do c=${j/runs/crcs}.filelist; echo $c; mkdir -p `dirname $c`; unzip -l $j |perl -lne 'print $1 if /  2024-\d\d-\d\d \d\d:\d\d   (.*)/' > $c; done > make_filelist.32_run_31_with_correct_project_names.log

real    0m8.126s
user    0m7.783s
sys 0m3.069s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/33_with_oracle_and_nodebug -name '*.jar'`; do c=${j/runs/crcs}.filelist; echo $c; mkdir -p `dirname $c`; unzip -l $j |perl -lne 'print $1 if /  2024-\d\d-\d\d \d\d:\d\d   (.*)/' > $c; done > make_filelist.33_with_oracle_and_nodebug.log

real    0m12.873s
user    0m11.454s
sys 0m5.796s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/34_with_eq_from_dot_and_new_jep181 -name '*.jar'`; do c=${j/runs/crcs}.filelist; echo $c; mkdir -p `dirname $c`; unzip -l $j |perl -lne 'print $1 if /  2024-\d\d-\d\d \d\d:\d\d   (.*)/' > $c; done > make_filelist.34_with_eq_from_dot_and_new_jep181.log

real    0m12.569s
user    0m11.423s
sys 0m5.654s
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ for d in */; do echo $d; ( cd $d && md5sum `find jars -name '*.jar.filelist'|sort` > all_filelists.txt ); done
32_run_31_with_correct_project_names/
33_with_oracle_and_nodebug/
34_with_eq_from_dot_and_new_jep181/
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ md5sum */all_filelists.txt
d9b46333680091df59dfb585e2c5a3c0  32_run_31_with_correct_project_names/all_filelists.txt
74563ed9e507a2bad9a07f2bb2c563db  33_with_oracle_and_nodebug/all_filelists.txt
cf4ce51680a15fd929b30217bd5e3c7c  34_with_eq_from_dot_and_new_jep181/all_filelists.txt

For example:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ diff 33_with_oracle_and_nodebug/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar.filelist 34_with_eq_from_dot_and_new_jep181/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar.filelist|wc -l
373
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ diff <(sort 33_with_oracle_and_nodebug/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar.filelist) <(sort 34_with_eq_from_dot_and_new_jep181/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar.filelist)|wc -l
0
wtwhite commented 4 months ago

Finally, non-class file content of jars differs between runs 33 and 34:

wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/34_with_eq_from_dot_and_new_jep181 -name '*.jar'`; do c=${j/runs/crcs}.nonclass.crcs; echo $c; mkdir -p `dirname $c`; ./non_classes_and_their_crcs_in_jar.sh $j > $c; done > crcs/make_non_class_crcs.34_with_eq_from_dot_and_new_jep181.log

real    0m14.680s
user    0m15.127s
sys 0m5.857s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/33_with_oracle_and_nodebug -name '*.jar'`; do c=${j/runs/crcs}.nonclass.crcs; echo $c; mkdir -p `dirname $c`; ./non_classes_and_their_crcs_in_jar.sh $j > $c; done > crcs/make_non_class_crcs.33_with_oracle_and_nodebug.log

real    0m14.637s
user    0m15.320s
sys 0m5.551s
wtwhite@wtwhite-vuw-vm:~/code/jcompile$ time for j in `find runs/32_run_31_with_correct_project_names -name '*.jar'`; do c=${j/runs/crcs}.nonclass.crcs; echo $c; mkdir -p `dirname $c`; ./non_classes_and_their_crcs_in_jar.sh $j > $c; done > crcs/make_non_class_crcs.32_run_31_with_correct_project_names.log

real    0m10.500s
user    0m11.134s
sys 0m3.835s
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ for d in */; do echo $d; ( cd $d && md5sum `find jars -name '*.nonclass.crcs'|sort` > all_nonclass_md5s.txt ); done
32_run_31_with_correct_project_names/
33_with_oracle_and_nodebug/
34_with_eq_from_dot_and_new_jep181/
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ ls -ltra */all_nonclass_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 187829 Jun 27 10:03 32_run_31_with_correct_project_names/all_nonclass_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 252849 Jun 27 10:03 33_with_oracle_and_nodebug/all_nonclass_md5s.txt
-rw-rw-r-- 1 wtwhite wtwhite 252849 Jun 27 10:03 34_with_eq_from_dot_and_new_jep181/all_nonclass_md5s.txt
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ md5sum */all_nonclass_md5s.txt
84ed5745cb380b7be065360b20fadd0f  32_run_31_with_correct_project_names/all_nonclass_md5s.txt
43ce08474e79ad5b62eb6e8aaff9c787  33_with_oracle_and_nodebug/all_nonclass_md5s.txt
d8bdd7a74748141e9e5c9360cefa5f5a  34_with_eq_from_dot_and_new_jep181/all_nonclass_md5s.txt

They differ at least in respect of the manifests and properties files:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ diff 33_with_oracle_and_nodebug/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar.nonclass.crcs 34_with_eq_from_dot_and_new_jep181/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar.nonclass.crcs
3c3
< META-INF/MANIFEST.MF  87abed49
---
> META-INF/MANIFEST.MF  870c849e
7c7
< META-INF/maven/commons-codec/commons-codec/pom.properties 7802340c
---
> META-INF/maven/commons-codec/commons-codec/pom.properties 7541066c

In particular, the timestamps in the Bnd-LastModified manifest entry and in the header comment in the properties files differ:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ diff <(unzip -p ../runs/33_with_oracle_and_nodebug/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar META-INF/MANIFEST.MF) <(unzip -p ../runs/34_with_eq_from_dot_and_new_jep181/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar META-INF/MANIFEST.MF)
15c15
< Bnd-LastModified: 1718324482079
---
> Bnd-LastModified: 1719289763525
wtwhite@wtwhite-vuw-vm:~/code/jcompile/crcs$ diff <(unzip -p ../runs/33_with_oracle_and_nodebug/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar META-INF/maven/commons-codec/commons-codec/pom.properties) <(unzip -p ../runs/34_with_eq_from_dot_and_new_jep181/jars/EQ/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11.jar META-INF/maven/commons-codec/commons-codec/pom.properties)
2c2
< #Fri Jun 14 00:21:24 UTC 2024
---
> #Tue Jun 25 04:29:27 UTC 2024