Open SomberNight opened 5 years ago
First of all, congrats on making your builds reproducible. That's a very noble cause.
The rationale is discussed here: https://github.com/AppImage/AppImageUpdate/issues/83
We explicitly discussed reproducible builds back then, but I don't remember how this has actually been dealt with in the current implementation.
Can you shed some light on this @TheAssassin?
(@TheAssassin: I hope I am not mixing things up here. https://github.com/AppImage/AppImageUpdate/issues/83 is about increasing AppImageUpdate efficiency, but the description for digest-md5 is stated as "desktop integration purposes for a given AppImage". Looking back at the discussion, it seems like I did not understand the purpose of digest-md5 from the beginning and it was not clearly explained in the PR: https://github.com/AppImage/AppImageKit/pull/768#issuecomment-386383507)
You mixed up a few things. You've found an entry for the software digest-md5
, which is a utility built in AppImageKit. You should read more carefully, also the context.
The section contains an MD5 hash of the squashfs image embedded in the AppImage, this is used to check for the equality of two AppImages (e.g., in AppImageUpdate). If the contents differ, the hash differs. Simple as that. Reproducibility is tested and verified in our continuous deployment scripts. If you use the same version of appimagetool, it will create the same AppImage for the same contents.
Must be your test that is wrong here. There's nothing like "almost identical". Either they are or are not.
There's nothing like "almost identical". Either they are or are not.
I just mean that there only seems to be a small and structured difference between the binaries. Anyway, it's semantics.
Must be your test that is wrong here
Ok. So let me detail what the script is doing then.
Broadly, it creates some directory structure, and then uses appimagetool to create the binary from that. However it is running in Docker, to make the other stuff deterministic (and to streamline building), so I cannot invoke appimagetool directly due to issues with fuse. What I am doing instead is:
"$CACHEDIR/appimagetool" --appimage-extract
env VERSION="$VERSION" ./squashfs-root/AppRun --no-appstream --verbose "$APPDIR" "$APPIMAGE"
So I unpack appimagetool using --appimage-extract
, and then run the unpacked AppRun
instead.
Now, even if I do this in a virtual machine and take a snapshot of the whole VM between the two lines, so directly before running AppRun
, the resulting AppImage binaries are NOT identical!
They differ in the way described in the OP.
If you use the same version of appimagetool, it will create the same AppImage for the same contents.
Okay. But that seems to contradict what I just described to have observed.
I've now tried creating only the directory structure in docker, once, then copying that to the host machine, where fuse can be used, and using appimagetool directly twice in a row on the unchanged directory structure. The produced binaries are not identical.
Using appimagetool release 11 (appimagetool-x86_64.AppImage c13026b9ebaa20a17e7e0a4c818a901f0faba759801d8ceab3bb6007dde00372
), the difference is only in the elf section '.digest_md5'.
Using appimagetool release "continuous build" (appimagetool-x86_64.AppImage 96a847adfcc5bd88e30f69fff7399ef1d2feabf7161ddf85d2fd6f8062bf8fc9
), the difference is only in the elf section '.digest_md5'.
Using appimagetool release 10 (appimagetool-x86_64.AppImage 2f8a62f8ad1a4ad9608132bc467d6cc3f2c948ec5697a57b983e2824e98aeb0f
), the difference is unstructured, all over the file.
Our reproducibility test only covers the case where you want to create an AppImage for the exact same directory, and I just tested that with the latest continuous build and it works fine here.
You probably have metadata changes on every build which cause the resulting squashfs image to be different. I tested that and on my computer, the resulting AppImages are different now, too.
Your tool diffoscope is wrong; the changes are not only in that section. You should always test with a second tool. I tried both diffoscope and the good ol' hexdiff, and hexdiff finds differences in other places, too (most likely squashfs metadata).
Your tool diffoscope is wrong; the changes are not only in that section
Thanks. You are right. Looking at the raw bytes, there are other diffs later in the file.
create an AppImage for the exact same directory
If I build two appimages and run --appimage-extract on them, should the reason for this difference between the binaries be visible also on the extracted folders? As in, should I expect to see some difference in the extracted folders?
You probably have metadata changes on every build which cause the resulting squashfs image to be different
I am still left wondering why I cannot build the same binaries even when taking a snapshot of the whole VM. i.e.
Now binaries A and B are different. This is with using latest VirtualBox (6.0.4).
If you really suspect the reason is metadata changes on the appdir, do you have a suggestion how to detect/inspect that?
Metadata can be access or modification timestamps. The issue here is that when the files are extracted, the original metadata contained in the squashfs image is not restored. Therefore, all files look like they're freshly created (i.e., they have the current timestamp set for mtime/atime; atime must be available of course).
The issue is the extraction code in the runtime. You can try to mount the AppImage instead of extracting it. Then, run appimagetool on the mountpoint and check if that works.
In any case, I do want to keep storing timestamps, therefore we must fix the runtime to set those correctly on extraction. CC @azubieta
Metadata can be access or modification timestamps
I've already been resetting all st_atime and st_mtime timestamps to fixed values; but I admit I forgot about symlinks (touch -h option), so thanks for explicitly mentioning timestamps.
find -exec touch -h -d '2000-11-11T11:11:11+00:00' {} +
Regardless, I still fail to deterministicly build the binary I want.
So I've started deleting files from my appdir, to see if some specific files are at fault.
If I delete almost everything (only leaving the bare minimum to let appimagetool succeed), I can reproducibly build the same binary every time. No need to tinker with timestamps between builds.
If I start leaving in (not deleting) more and more files, the build no longer remains deterministic, after some threshold. It does not matter which files I keep.
Specifically, if there are about 50 files in my AppDir, the built binary will no longer have the same hash; rather it will have one out of two hashes randomly.
$ for i in {1..10}; do env VERSION=1.0 ARCH=x86_64 ./appimagetool11-x86_64.AppImage --no-appstream --verbose appdir b$i; done
$ md5sum b*
5e9e9d52b2006f88a6c5e522f28e5184 b1
1d2524ee07cfb4690ce3ea2437e7e254 b10
1d2524ee07cfb4690ce3ea2437e7e254 b2
1d2524ee07cfb4690ce3ea2437e7e254 b3
5e9e9d52b2006f88a6c5e522f28e5184 b4
1d2524ee07cfb4690ce3ea2437e7e254 b5
1d2524ee07cfb4690ce3ea2437e7e254 b6
1d2524ee07cfb4690ce3ea2437e7e254 b7
1d2524ee07cfb4690ce3ea2437e7e254 b8
1d2524ee07cfb4690ce3ea2437e7e254 b9
If I leave even more files in, the set of possible hashes for the binary increases.
When I have around 100 files in my AppDir, it is still quite likely that I get a hash that I have seen before. I've built 500 binaries, which had 237 unique hashes.
(maybe it's about cumulative file size, not number of files; or something else related)
Do you still think this can be explained with metadata of the AppDir? I highly doubt it at this point.
As far as I can see, this must be a bug in squashfs. Our software behaves correctly by calculating different hashsums and putting them into .digest_md5
. I'm not sure why this is happening, but as said above, I could reproduce your bug using the initial method described above.
I'm not a squashfs expert, we're just "customers" using tools provided by them. I guess we need to carry that bug upstream to them. Mind to open an issue over here? The thing with squashfs-tools is that the project doesn't seem to be very active any more.
Thanks for pointing to plougher/squashfs-tools After looking at that, I've found
I see you also apply some patch to squashfs-tools locally, during the build https://github.com/AppImage/AppImageKit/pull/651 but this is only a subset in terms of changes of the linked debian patchset
I see you've seen the squashfskit fork too https://github.com/AppImage/AppImageKit/issues/815#issuecomment-441906786 and the list of patches supposedly needed for reproducible builds https://github.com/plougher/squashfs-tools/pull/51#issuecomment-440265219
from a very brief look at the patches, and your local patch, it was obvious you are missing at least https://github.com/squashfskit/squashfskit/commit/afc0c76a170bd17cbd29bbec6ae6d2227e398570 I've applied that, and built appimagetool; but I could still not build reproducibly
I then tried to just change to the squashfskit fork. With success. With the squashfskit fork I can reproducibly build my intended binary.
Here is a patch for AppImageKit to make it clear what I did: (I had some problem with xz that I could not figure out, so I disabled that)
diff --git a/cmake/dependencies.cmake b/cmake/dependencies.cmake
index 9f7901f..236cb6b 100644
--- a/cmake/dependencies.cmake
+++ b/cmake/dependencies.cmake
@@ -50,15 +50,14 @@ if(xz_LIBRARY_DIRS)
endif()
ExternalProject_Add(mksquashfs
- GIT_REPOSITORY https://github.com/plougher/squashfs-tools/
- GIT_TAG 5be5d61
+ GIT_REPOSITORY https://github.com/squashfskit/squashfskit/
+ GIT_TAG 68ea4ae7553f3d58c14be19443cfc9e84b7244c0
UPDATE_COMMAND "" # ${MAKE} sure CMake won't try to fetch updates unnecessarily and hence rebuild the dependency every time
- PATCH_COMMAND patch -N -p1 < ${PROJECT_SOURCE_DIR}/src/mksquashfs-mkfs-fixed-timestamp.patch || true
CONFIGURE_COMMAND ${SED} -i "s|CFLAGS += -DXZ_SUPPORT|CFLAGS += ${mksquashfs_cflags}|g" <SOURCE_DIR>/squashfs-tools/Makefile
COMMAND ${SED} -i "s|LIBS += -llzma|LIBS += -Bstatic ${mksquashfs_ldflags}|g" <SOURCE_DIR>/squashfs-tools/Makefile
COMMAND ${SED} -i "s|install: mksquashfs unsquashfs|install: mksquashfs|g" squashfs-tools/Makefile
COMMAND ${SED} -i "/cp unsquashfs/d" squashfs-tools/Makefile
- BUILD_COMMAND env CC=${CC} CXX=${CXX} LDFLAGS=${LDFLAGS} ${MAKE} -C squashfs-tools/ XZ_SUPPORT=1 mksquashfs
+ BUILD_COMMAND env CC=${CC} CXX=${CXX} LDFLAGS=${LDFLAGS} ${MAKE} -C squashfs-tools/ XZ_SUPPORT=0 mksquashfs
# ${MAKE} install unfortunately expects unsquashfs to be built as well, hence can't install the binary
# therefore using built file in SOURCE_DIR
# TODO: implement building out of source
diff --git a/src/appimagetool.c b/src/appimagetool.c
index 8316d58..57dd1db 100644
--- a/src/appimagetool.c
+++ b/src/appimagetool.c
@@ -198,9 +198,6 @@ int sfs_mksquashfs(char *source, char *destination, int offset) {
args[i++] = exclude_file;
}
- args[i++] = "-mkfs-fixed-time";
- args[i++] = "0";
-
args[i++] = 0;
if (verbose) {
Then I built appimagetool, and then I built my binary as
env VERSION=1.0 ARCH=x86_64 SOURCE_DATE_EPOCH=1 ./appimagetool --no-appstream --verbose appdir b1
So, would you consider switching to that fork of squashfs-tools? Alternatively, if I figured out exactly what patches on top of squashfs-tools are needed and made a PR, would you be interested in that?
Yes, that'd be a good idea. And I think we can even send them our offset
patch, if they don't have it already. Thanks for the pointer. I start to like https://reproducible-builds.org/.
Now official squashfs 4.4 makes reproducible images by default, see: https://lore.kernel.org/lkml/CAB3wodcL=gnQOmHGGNukWK3OUbU2p=OHzLmzPi7ns_WNTGBEwg@mail.gmail.com/
Now official squashfs 4.4 makes reproducible images by default, see: https://lore.kernel.org/lkml/CAB3wodcL=gnQOmHGGNukWK3OUbU2p=OHzLmzPi7ns_WNTGBEwg@mail.gmail.com/
Can we get an eta on when the updated squashfs would make it into a stable appimagetool
release?
It's been almost a year since this was fixed upstream, and I see that the latest stable release from AppImageKit still uses mksquashfs
v4.3.
user@disp6736:~$ wget --quiet --continue --output-document="appimagetool.AppImage" https://github.com/AppImage/AppImageKit/releases/download/12/appimagetool-x86_64.AppImage
user@disp6736:~$ chmod +x appimagetool.AppImage
user@disp6736:~$ ./appimagetool.AppImage --appimage-extract > /dev/null
user@disp6736:~$ squashfs-root/usr/lib/appimagekit/mksquashfs -version
mksquashfs version 4.3-git (2017/07/18)
copyright (C) 2017 Phillip Lougher <phillip@squashfs.org.uk>
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2,
or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
user@disp6736:~$
I stumbled on several examples of hacks to fix appimagetool
with squashfskit
, but it would be great if we could use the stable release natively to make reproducible builds.
Is there an ETA on when we can expect the latest stable release of appimagetool
to include mksquashfs
v4.4"?
Hello @maltfield at this point I am not working on the "old" appimagetool anymore but am focusing on the new Go-based implementation over at https://github.com/probonopd/go-appimage/tree/master/src/appimagetool. That one currently uses an external mksquashfs
which should be easy to update to 4.4 (if it isn't already using that one).
oh, ok. I wasn't aware that this repo was being deprecated. Do you have any eta on when the first stable release will be out for the new appimagetool
from the go-appimage repo?
No, it's a work-in-progress but usable for many apps already at this point. Maybe you want to give it a try and report there in case you are running into issues. Thanks!
Just for clarification, this repo is not going to be deprecated, but likely appimagetool
will be removed from here at some point in time.
Would it be possible to bump the squashfs version here (or use the fork that was mentioned above with the provided diff) and release a new version, even though work is primarily done on a rewrite of the appimagetool?
I would happily review a PR that updates the build system. It shouldn't be too difficult to build another version of squashfs-tools.
if anyone peruses this, it would probably be better to use squashfs-tools v4.4 instead of the fork, as it fixed a few CVEs which I don't think made it into the fork (I haven't looked into the specifics)
fwiw, here's I'm doing to swap out mksquashfs
in the latest stable appimagetool
available from this repo:
With appimagetool release 13 (which bundles new enough mksquashfs (https://github.com/AppImage/AppImageKit/pull/996)), the situation is now much better.
When building an appimage for the Electrum project, we previously had to
./appimagetool --appimage-extract
./squashfs-root/usr/lib/appimagekit/mksquashfs
to remove the -mkfs-fixed-time 0
argument (using a small wrapper script)see https://github.com/spesmilo/electrum/commit/ae714772c38410a0169f2c76a14a64a62c0daff0
Using appimagetool 13, we no longer have to build a fork of mksquashfs, but we still have to:
./appimagetool --appimage-extract
./squashfs-root/usr/lib/appimagekit/mksquashfs
to remove the -mkfs-time 0
argument (using a small wrapper script)see https://github.com/spesmilo/electrum/commit/ca2d1eea45cdbbc55e3e3b970bcc2e2ea487fb6a
This is needed as mksquashfs errors if both SOURCE_DATE_EPOCH
env var is set and -mkfs-time
arg is passed, and we have SOURCE_DATE_EPOCH
exported.
https://github.com/plougher/squashfs-tools/blob/19b161c1cd3e31f7a396ea92dea4390ad43f27b9/squashfs-tools/mksquashfs.c#L5892-L5900
I have tried calling appimagetool AppRun with SOURCE_DATE_EPOCH
unset (but having appimagetool pass -mkfs-time 0
to mksquashfs) but the binaries were not reproducible that way. I have not investigated why.
see (does not produce reproducible binaries): https://github.com/SomberNight/electrum/commit/6e0865f1f4d551ed660d5a0a0a68467cc5d507cd
Due to these reasons, I think it might be better if appimage was not passing -mkfs-time 0
to mksquashfs
at all, but leaving it to the caller to set SOURCE_DATE_EPOCH
if they so wish.
https://github.com/AppImage/AppImageKit/blob/1681fd84dbe09c7d9b22e13cdb16ea601aa0ec47/src/appimagetool.c#L200-L201
I am trying to make the AppImage binary for Electrum reproducible/deterministic.
Looking at e.g. https://github.com/AppImage/AppImageKit/issues/625, I take it this should be possible. I am using appimagetool release 11.
I think I've managed to build almost identical binaries (only been testing on one machine for now). Would like to request pointers/help regarding what might be missing.
If I build two binaries, and run --appimage-extract on them, the extracted folders seem identical (e.g. recursive md5sum, and then diff of that, is empty)
diff of recursive md5sum of extracted contents
``` cd dist/ ./electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage1 --appimage-extract mv squashfs-root/ squashfs-root1/ ./electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage2 --appimage-extract mv squashfs-root/ squashfs-root2/ $(cd squashfs-root1; find -type f -exec md5sum '{}' \; > ./../md5sum1) $(cd squashfs-root2; find -type f -exec md5sum '{}' \; > ./../md5sum2) diff md5sum1 md5sum2 # << empty ```So that's good I guess :)
If I use diffoscope to compare the binaries themselves, it tells me the only difference is due to an elf section called
digest_md5
:I've found this in the appimage docs:
Is that in the docs related to this elf section?
Any idea what I need to make the build deterministic?