AppImage / AppImageKit

Package desktop applications as AppImages that run on common Linux-based operating systems, such as RHEL, CentOS, openSUSE, SLED, Ubuntu, Fedora, debian and derivatives. Join #AppImage on irc.libera.chat
http://appimage.org
Other
8.66k stars 553 forks source link

problems making appimage reproducible #929

Open SomberNight opened 5 years ago

SomberNight commented 5 years ago

I am trying to make the AppImage binary for Electrum reproducible/deterministic.

Looking at e.g. https://github.com/AppImage/AppImageKit/issues/625, I take it this should be possible. I am using appimagetool release 11.

I think I've managed to build almost identical binaries (only been testing on one machine for now). Would like to request pointers/help regarding what might be missing.

If I build two binaries, and run --appimage-extract on them, the extracted folders seem identical (e.g. recursive md5sum, and then diff of that, is empty)

diff of recursive md5sum of extracted contents ``` cd dist/ ./electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage1 --appimage-extract mv squashfs-root/ squashfs-root1/ ./electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage2 --appimage-extract mv squashfs-root/ squashfs-root2/ $(cd squashfs-root1; find -type f -exec md5sum '{}' \; > ./../md5sum1) $(cd squashfs-root2; find -type f -exec md5sum '{}' \; > ./../md5sum2) diff md5sum1 md5sum2 # << empty ```

So that's good I guess :)

If I use diffoscope to compare the binaries themselves, it tells me the only difference is due to an elf section called digest_md5:

$ diffoscope dist/electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage1 dist/electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage2
 |############################|  100%                             Time: 0:00:05
--- dist/electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage1
+++ dist/electrum-3.3.4-76-geb04551-dirty-x86_64.AppImage2
├── readelf --wide --decompress --hex-dump=.digest_md5 {}
│ @@ -1,4 +1,4 @@
│
│  Hex dump of section '.digest_md5':
│ -  0x00000000 77e356ea eefe1459 a40f00d9 ab5c0e00 w.V....Y.....\..
│ +  0x00000000 1dda23b5 31f9024c fe6d2755 e930a41a ..#.1..L.m'U.0..

I've found this in the appimage docs:

digest-md5 Calculates the MD5 digest used for desktop integration purposes for a given AppImage. This digest depends on the path, not on the contents.

Is that in the docs related to this elf section?

Any idea what I need to make the build deterministic?

probonopd commented 5 years ago

First of all, congrats on making your builds reproducible. That's a very noble cause.

The rationale is discussed here: https://github.com/AppImage/AppImageUpdate/issues/83

We explicitly discussed reproducible builds back then, but I don't remember how this has actually been dealt with in the current implementation.

Can you shed some light on this @TheAssassin?

(@TheAssassin: I hope I am not mixing things up here. https://github.com/AppImage/AppImageUpdate/issues/83 is about increasing AppImageUpdate efficiency, but the description for digest-md5 is stated as "desktop integration purposes for a given AppImage". Looking back at the discussion, it seems like I did not understand the purpose of digest-md5 from the beginning and it was not clearly explained in the PR: https://github.com/AppImage/AppImageKit/pull/768#issuecomment-386383507)

TheAssassin commented 5 years ago

You mixed up a few things. You've found an entry for the software digest-md5, which is a utility built in AppImageKit. You should read more carefully, also the context.

The section contains an MD5 hash of the squashfs image embedded in the AppImage, this is used to check for the equality of two AppImages (e.g., in AppImageUpdate). If the contents differ, the hash differs. Simple as that. Reproducibility is tested and verified in our continuous deployment scripts. If you use the same version of appimagetool, it will create the same AppImage for the same contents.

Must be your test that is wrong here. There's nothing like "almost identical". Either they are or are not.

SomberNight commented 5 years ago

There's nothing like "almost identical". Either they are or are not.

I just mean that there only seems to be a small and structured difference between the binaries. Anyway, it's semantics.

Must be your test that is wrong here

Ok. So let me detail what the script is doing then.

Broadly, it creates some directory structure, and then uses appimagetool to create the binary from that. However it is running in Docker, to make the other stuff deterministic (and to streamline building), so I cannot invoke appimagetool directly due to issues with fuse. What I am doing instead is:

"$CACHEDIR/appimagetool" --appimage-extract
env VERSION="$VERSION" ./squashfs-root/AppRun --no-appstream --verbose "$APPDIR" "$APPIMAGE"

https://github.com/spesmilo/electrum/blob/03ab64e39f210f16ac8419974c67eea03a19da6c/contrib/build-linux/appimage/build.sh#L201-L202

So I unpack appimagetool using --appimage-extract, and then run the unpacked AppRun instead.

Now, even if I do this in a virtual machine and take a snapshot of the whole VM between the two lines, so directly before running AppRun, the resulting AppImage binaries are NOT identical! They differ in the way described in the OP.

If you use the same version of appimagetool, it will create the same AppImage for the same contents.

Okay. But that seems to contradict what I just described to have observed.

SomberNight commented 5 years ago

I've now tried creating only the directory structure in docker, once, then copying that to the host machine, where fuse can be used, and using appimagetool directly twice in a row on the unchanged directory structure. The produced binaries are not identical.

Using appimagetool release 11 (appimagetool-x86_64.AppImage c13026b9ebaa20a17e7e0a4c818a901f0faba759801d8ceab3bb6007dde00372), the difference is only in the elf section '.digest_md5'. Using appimagetool release "continuous build" (appimagetool-x86_64.AppImage 96a847adfcc5bd88e30f69fff7399ef1d2feabf7161ddf85d2fd6f8062bf8fc9), the difference is only in the elf section '.digest_md5'. Using appimagetool release 10 (appimagetool-x86_64.AppImage 2f8a62f8ad1a4ad9608132bc467d6cc3f2c948ec5697a57b983e2824e98aeb0f), the difference is unstructured, all over the file.

TheAssassin commented 5 years ago

Our reproducibility test only covers the case where you want to create an AppImage for the exact same directory, and I just tested that with the latest continuous build and it works fine here.

You probably have metadata changes on every build which cause the resulting squashfs image to be different. I tested that and on my computer, the resulting AppImages are different now, too.

Your tool diffoscope is wrong; the changes are not only in that section. You should always test with a second tool. I tried both diffoscope and the good ol' hexdiff, and hexdiff finds differences in other places, too (most likely squashfs metadata).

SomberNight commented 5 years ago

Your tool diffoscope is wrong; the changes are not only in that section

Thanks. You are right. Looking at the raw bytes, there are other diffs later in the file.

create an AppImage for the exact same directory

If I build two appimages and run --appimage-extract on them, should the reason for this difference between the binaries be visible also on the extracted folders? As in, should I expect to see some difference in the extracted folders?

You probably have metadata changes on every build which cause the resulting squashfs image to be different

I am still left wondering why I cannot build the same binaries even when taking a snapshot of the whole VM. i.e.

Now binaries A and B are different. This is with using latest VirtualBox (6.0.4).

If you really suspect the reason is metadata changes on the appdir, do you have a suggestion how to detect/inspect that?

TheAssassin commented 5 years ago

Metadata can be access or modification timestamps. The issue here is that when the files are extracted, the original metadata contained in the squashfs image is not restored. Therefore, all files look like they're freshly created (i.e., they have the current timestamp set for mtime/atime; atime must be available of course).

The issue is the extraction code in the runtime. You can try to mount the AppImage instead of extracting it. Then, run appimagetool on the mountpoint and check if that works.

In any case, I do want to keep storing timestamps, therefore we must fix the runtime to set those correctly on extraction. CC @azubieta

SomberNight commented 5 years ago

Metadata can be access or modification timestamps

I've already been resetting all st_atime and st_mtime timestamps to fixed values; but I admit I forgot about symlinks (touch -h option), so thanks for explicitly mentioning timestamps. find -exec touch -h -d '2000-11-11T11:11:11+00:00' {} +

Regardless, I still fail to deterministicly build the binary I want.

So I've started deleting files from my appdir, to see if some specific files are at fault.

If I delete almost everything (only leaving the bare minimum to let appimagetool succeed), I can reproducibly build the same binary every time. No need to tinker with timestamps between builds.

If I start leaving in (not deleting) more and more files, the build no longer remains deterministic, after some threshold. It does not matter which files I keep.

Specifically, if there are about 50 files in my AppDir, the built binary will no longer have the same hash; rather it will have one out of two hashes randomly.

$ for i in {1..10}; do env VERSION=1.0 ARCH=x86_64 ./appimagetool11-x86_64.AppImage --no-appstream --verbose appdir b$i; done
$ md5sum b*
5e9e9d52b2006f88a6c5e522f28e5184  b1
1d2524ee07cfb4690ce3ea2437e7e254  b10
1d2524ee07cfb4690ce3ea2437e7e254  b2
1d2524ee07cfb4690ce3ea2437e7e254  b3
5e9e9d52b2006f88a6c5e522f28e5184  b4
1d2524ee07cfb4690ce3ea2437e7e254  b5
1d2524ee07cfb4690ce3ea2437e7e254  b6
1d2524ee07cfb4690ce3ea2437e7e254  b7
1d2524ee07cfb4690ce3ea2437e7e254  b8
1d2524ee07cfb4690ce3ea2437e7e254  b9

If I leave even more files in, the set of possible hashes for the binary increases.

When I have around 100 files in my AppDir, it is still quite likely that I get a hash that I have seen before. I've built 500 binaries, which had 237 unique hashes.

(maybe it's about cumulative file size, not number of files; or something else related)

Do you still think this can be explained with metadata of the AppDir? I highly doubt it at this point.

TheAssassin commented 5 years ago

As far as I can see, this must be a bug in squashfs. Our software behaves correctly by calculating different hashsums and putting them into .digest_md5. I'm not sure why this is happening, but as said above, I could reproduce your bug using the initial method described above.

I'm not a squashfs expert, we're just "customers" using tools provided by them. I guess we need to carry that bug upstream to them. Mind to open an issue over here? The thing with squashfs-tools is that the project doesn't seem to be very active any more.

SomberNight commented 5 years ago

Thanks for pointing to plougher/squashfs-tools After looking at that, I've found

I see you also apply some patch to squashfs-tools locally, during the build https://github.com/AppImage/AppImageKit/pull/651 but this is only a subset in terms of changes of the linked debian patchset

I see you've seen the squashfskit fork too https://github.com/AppImage/AppImageKit/issues/815#issuecomment-441906786 and the list of patches supposedly needed for reproducible builds https://github.com/plougher/squashfs-tools/pull/51#issuecomment-440265219

from a very brief look at the patches, and your local patch, it was obvious you are missing at least https://github.com/squashfskit/squashfskit/commit/afc0c76a170bd17cbd29bbec6ae6d2227e398570 I've applied that, and built appimagetool; but I could still not build reproducibly

I then tried to just change to the squashfskit fork. With success. With the squashfskit fork I can reproducibly build my intended binary.

Here is a patch for AppImageKit to make it clear what I did: (I had some problem with xz that I could not figure out, so I disabled that)

diff --git a/cmake/dependencies.cmake b/cmake/dependencies.cmake
index 9f7901f..236cb6b 100644
--- a/cmake/dependencies.cmake
+++ b/cmake/dependencies.cmake
@@ -50,15 +50,14 @@ if(xz_LIBRARY_DIRS)
 endif()

 ExternalProject_Add(mksquashfs
-    GIT_REPOSITORY https://github.com/plougher/squashfs-tools/
-    GIT_TAG 5be5d61
+    GIT_REPOSITORY https://github.com/squashfskit/squashfskit/
+    GIT_TAG 68ea4ae7553f3d58c14be19443cfc9e84b7244c0
     UPDATE_COMMAND ""  # ${MAKE} sure CMake won't try to fetch updates unnecessarily and hence rebuild the dependency every time
-    PATCH_COMMAND patch -N -p1 < ${PROJECT_SOURCE_DIR}/src/mksquashfs-mkfs-fixed-timestamp.patch || true
     CONFIGURE_COMMAND ${SED} -i "s|CFLAGS += -DXZ_SUPPORT|CFLAGS += ${mksquashfs_cflags}|g" <SOURCE_DIR>/squashfs-tools/Makefile
     COMMAND ${SED} -i "s|LIBS += -llzma|LIBS += -Bstatic ${mksquashfs_ldflags}|g" <SOURCE_DIR>/squashfs-tools/Makefile
     COMMAND ${SED} -i "s|install: mksquashfs unsquashfs|install: mksquashfs|g" squashfs-tools/Makefile
     COMMAND ${SED} -i "/cp unsquashfs/d" squashfs-tools/Makefile
-    BUILD_COMMAND env CC=${CC} CXX=${CXX} LDFLAGS=${LDFLAGS} ${MAKE} -C squashfs-tools/ XZ_SUPPORT=1 mksquashfs
+    BUILD_COMMAND env CC=${CC} CXX=${CXX} LDFLAGS=${LDFLAGS} ${MAKE} -C squashfs-tools/ XZ_SUPPORT=0 mksquashfs
     # ${MAKE} install unfortunately expects unsquashfs to be built as well, hence can't install the binary
     # therefore using built file in SOURCE_DIR
     # TODO: implement building out of source
diff --git a/src/appimagetool.c b/src/appimagetool.c
index 8316d58..57dd1db 100644
--- a/src/appimagetool.c
+++ b/src/appimagetool.c
@@ -198,9 +198,6 @@ int sfs_mksquashfs(char *source, char *destination, int offset) {
             args[i++] = exclude_file;
         }

-        args[i++] = "-mkfs-fixed-time";
-        args[i++] = "0";
-
         args[i++] = 0;

         if (verbose) {

Then I built appimagetool, and then I built my binary as env VERSION=1.0 ARCH=x86_64 SOURCE_DATE_EPOCH=1 ./appimagetool --no-appstream --verbose appdir b1


So, would you consider switching to that fork of squashfs-tools? Alternatively, if I figured out exactly what patches on top of squashfs-tools are needed and made a PR, would you be interested in that?

TheAssassin commented 5 years ago

Yes, that'd be a good idea. And I think we can even send them our offset patch, if they don't have it already. Thanks for the pointer. I start to like https://reproducible-builds.org/.

nahuel commented 4 years ago

Now official squashfs 4.4 makes reproducible images by default, see: https://lore.kernel.org/lkml/CAB3wodcL=gnQOmHGGNukWK3OUbU2p=OHzLmzPi7ns_WNTGBEwg@mail.gmail.com/

maltfield commented 4 years ago

Now official squashfs 4.4 makes reproducible images by default, see: https://lore.kernel.org/lkml/CAB3wodcL=gnQOmHGGNukWK3OUbU2p=OHzLmzPi7ns_WNTGBEwg@mail.gmail.com/

Can we get an eta on when the updated squashfs would make it into a stable appimagetool release?

It's been almost a year since this was fixed upstream, and I see that the latest stable release from AppImageKit still uses mksquashfs v4.3.

user@disp6736:~$ wget --quiet --continue --output-document="appimagetool.AppImage" https://github.com/AppImage/AppImageKit/releases/download/12/appimagetool-x86_64.AppImage
user@disp6736:~$ chmod +x appimagetool.AppImage
user@disp6736:~$ ./appimagetool.AppImage --appimage-extract > /dev/null
user@disp6736:~$ squashfs-root/usr/lib/appimagekit/mksquashfs -version
mksquashfs version 4.3-git (2017/07/18)
copyright (C) 2017 Phillip Lougher <phillip@squashfs.org.uk>

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2,
or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
user@disp6736:~$ 

I stumbled on several examples of hacks to fix appimagetool with squashfskit, but it would be great if we could use the stable release natively to make reproducible builds.

Is there an ETA on when we can expect the latest stable release of appimagetool to include mksquashfs v4.4"?

probonopd commented 4 years ago

Hello @maltfield at this point I am not working on the "old" appimagetool anymore but am focusing on the new Go-based implementation over at https://github.com/probonopd/go-appimage/tree/master/src/appimagetool. That one currently uses an external mksquashfs which should be easy to update to 4.4 (if it isn't already using that one).

maltfield commented 4 years ago

oh, ok. I wasn't aware that this repo was being deprecated. Do you have any eta on when the first stable release will be out for the new appimagetool from the go-appimage repo?

probonopd commented 4 years ago

No, it's a work-in-progress but usable for many apps already at this point. Maybe you want to give it a try and report there in case you are running into issues. Thanks!

Just for clarification, this repo is not going to be deprecated, but likely appimagetool will be removed from here at some point in time.

bastimeyer commented 4 years ago

Would it be possible to bump the squashfs version here (or use the fork that was mentioned above with the provided diff) and release a new version, even though work is primarily done on a rewrite of the appimagetool?

TheAssassin commented 4 years ago

I would happily review a PR that updates the build system. It shouldn't be too difficult to build another version of squashfs-tools.

maltfield commented 4 years ago

if anyone peruses this, it would probably be better to use squashfs-tools v4.4 instead of the fork, as it fixed a few CVEs which I don't think made it into the fork (I haven't looked into the specifics)

fwiw, here's I'm doing to swap out mksquashfs in the latest stable appimagetool available from this repo:

SomberNight commented 2 years ago

With appimagetool release 13 (which bundles new enough mksquashfs (https://github.com/AppImage/AppImageKit/pull/996)), the situation is now much better.


When building an appimage for the Electrum project, we previously had to

see https://github.com/spesmilo/electrum/commit/ae714772c38410a0169f2c76a14a64a62c0daff0


Using appimagetool 13, we no longer have to build a fork of mksquashfs, but we still have to:

see https://github.com/spesmilo/electrum/commit/ca2d1eea45cdbbc55e3e3b970bcc2e2ea487fb6a

This is needed as mksquashfs errors if both SOURCE_DATE_EPOCH env var is set and -mkfs-time arg is passed, and we have SOURCE_DATE_EPOCH exported. https://github.com/plougher/squashfs-tools/blob/19b161c1cd3e31f7a396ea92dea4390ad43f27b9/squashfs-tools/mksquashfs.c#L5892-L5900

I have tried calling appimagetool AppRun with SOURCE_DATE_EPOCH unset (but having appimagetool pass -mkfs-time 0 to mksquashfs) but the binaries were not reproducible that way. I have not investigated why.

see (does not produce reproducible binaries): https://github.com/SomberNight/electrum/commit/6e0865f1f4d551ed660d5a0a0a68467cc5d507cd

Due to these reasons, I think it might be better if appimage was not passing -mkfs-time 0 to mksquashfs at all, but leaving it to the caller to set SOURCE_DATE_EPOCH if they so wish. https://github.com/AppImage/AppImageKit/blob/1681fd84dbe09c7d9b22e13cdb16ea601aa0ec47/src/appimagetool.c#L200-L201