Open probonopd opened 8 years ago
That would be a really cool feature. When someone on my local network has downloaded the app already, i can download it from him. But it needs to be verified. Is there something like a has for every AppImage from upstream? Otherwise cheap IoT devices from china could send you infected AppImages.
Like, a GPG signature? Currently these are separate files (outside of the AppImage), but we could also append them to the AppImage (=make them part of the AppImage).
I tried IPFS out the last two days and read a lot about. It has hashes integrated to find the right content. So you get the content you request.
Downloading files is easy. Here you get the lastest official Subsurface AppImage:
ipfs get QmUH4SZVdBPekZXkE77ntLknAtAjuiKsHgEW6eJzioyQyD
(you need to have the IPFS daemon running)
There is also ipget
which included a IPFS node.
ipget QmUH4SZVdBPekZXkE77ntLknAtAjuiKsHgEW6eJzioyQyD -o Subsurface-4.5.6-x86_64.AppImage
https://github.com/ipfs/ipget
Also check Ethereum https://www.ethereum.org/
@probonopd How would that help to distribute AppImages?
Another technology like IPFS is WebTorrent. You seed while you are on the website.
@davidak not sure yet; didn't check it in detail yet.
Regarding WebTorrent, who stays on a single webpage for so long? Probably more suited to video distribution than apps.
Check the Keybase filesystem: Public, signed directories for everyone in the world. https://keybase.io/docs/kbfs, very promising.
Every file you write in there is signed. There's no manual signing process, no taring or gzipping, no detached sigs. Instead, everything in this folder appears as plaintext files on everyone's computers. You can even open /keybase/public/yourname in your Finder or Explorer and drag things in.
And
Keybase can't be coerced to lie about your public keys, because each one needs to be announced, using a previous device or paper key. Together, these announcements form a chain that is announced in the bitcoin block chain.
But:
We're giving everyone 10 gigabytes. (...) There is no paid upgrade currently. The 10GB free accounts will stay free, but we'll likely offer paid storage for people who want to store more data.
Also see the Dat project https://datproject.org/ and the Beaker Browser https://beakerbrowser.com/ built on top of it. Also see https://twitter.com/probonopd/status/925106318796578818
A few thoughts from the Beaker Browser team
electron-updater
in Beaker right now, which is great because it has pluggable transports. For us, the main reason we haven't distributed Beaker over dat is the need for auto-updates. I think now that it'd be fairly trivial to write a Dat transport for electron-updater
and get all the behaviors we need.@pfrazee sounds promising. If you want to investigate binary delta updating, you can check out zsync(2), which is based on the same algorithms that rsync uses. It calculates a meta file for an existing file, containing a set of hashes (calculated by chunking a file into blocks with a specified blocksize and hashing the blocks using a specified hashing algorithm).
I'm sure it's possible for you to make use of the functionality in this library. Heck, I could even imagine zsync2 supporting Dat as a URL scheme.
We use Dat in Beaker to act as a website, but it can be any form of data storage. In the next version (0.8) we will have a built in user identity concept which will use Dat archives to represent users. That will eventually be a foundation for webs of trust in the application layer -- but it will take some time for the WoT networks to mature.
I've thought a lot about Web of Trusts recently for application deployment (AppImage related), and they'll apply to generic content as well.
Often, PGP's WoT is used as a reference for working Web of Trusts. Their trust model works like, "I trust user A, and user A, B and C trust user Z, so I can trust Z, too, I guess." However, this trust model is only used to verify the authenticity of a key a mail you receive is signed with, the crypto itself does not depend on it, and even if a key has no third party signatures, it doesn't mean much to the security of the communication itself. In most cases, the users know each other anyway, and trust the keys in their mail clients by validating the keys' fingerprints manually. It's a nice idea, but isn't used by many people. Nowadays, you'd rather put your key ID into all mails you send, send them over a second channel (like a chat service or phone), or put it on your website, where people can get it and download and trust the key before writing and after receiving mails.
When building a WoT from scratch, one can use pretty much the same methods and structures PGP established. Sure, it'll take a while to get people to use it, and build a large base of trusted users so that a certain level of security is reached. The algorithms and structures are proven in the real world, and despite they haven't ever reached the majority of email users, they are secure and work fine.
However, no WoT is really immune against malicious attacks. It's fairly easy to manipulate a WoT. Let me give you an example: By creating a few thousands of keys who then sign each others' keys (not everyone's, that'd be too obvious) and keys of all the other users (that'll make them look even more valid), you can create accounts appearing trustworthy, but have been created by some software. Time's not a factor here, the software could've been running for weeks or months. The problem is that it is really hard to detect those as being malicious (attackers are pretty good at finding flaws in your code, especially when it's open source), and once they're in the network, there is no chance to get rid of them unless you have some central "blacklist" (which undermines the decentralization aspects of a WoT). Even if you'd support some decentralized "anti trust" feature (like some second kind of signatures which discredit a key rather than making it look trustworthy), 10 minutes of an attack could be enough to do a lot of harm in dependent systems.
Transferring those thoughts to application distribution, as said, 10 minutes can be enough for an attack to do a lot of harm for your users. As research in the field of anti virus shows, 10 minutes can be enough for something like ransomware or computer worms to spread across a lot of computers. This is similar to zero days, they can be fixed within the same 10 minutes, and even if the fix would be deployed immediately, the ransomware can have infected 100s of 1000s of computers and thus have dealt a lot of damage. I could provide a list of references, but as we've all heard of it before, I don't think it's necessary.
Therefore, I am trying to construct some more secure trust models for the AppImage ecosystem. For AppImage's updating mechanism specifically, we could inspect the key the old AppImage is signed with, and then check whether the new AppImage's key matches the old one. In that case, we can trust it this time, and perform the update. Otherwise, we can either reject the update, or show a big yellow warning and have the user decide on it. As long as the key won't change, everything will work smoothly, but if there should be an issue, we can protect the user from any kind of attacks.
For the desktop integration via appimaged or (even better) the desktop environment itself, I'd imagine a trust model similar the one PPAs on Ubuntu established. We'd allow users to trust keys AppImages are signed with by adding them to a separate user specific key ring. (Distributions could even ship with a global keyring, such as openSUSE with it's openSUSE build service, which builds AppImages and signs them with the OBS key). Whenever it finds a new AppImage with an unknown key, it could ask the user about whether they want to trust the key or not. AppImages provided by the same developer would then be trusted automatically, however, new AppImages (i.e., the ones not marked as executable already) could show a "first use" warning, asking the user whether they want to run the AppImage when they double click it. When implementing the trust model I suggested, an additional security layer is put on top of this very basic security mechanism. Whenever an unknown key is encountered, the pop-up could also ask whether you'd want to trust the key. If you e.g., check a checkbox, it'd suppress further warnings for this specific key, otherwise the AppImage would still be executable, but the DE could still spawn a warning that the key cannot be trusted.
AppImageUpdate could eventually implement the same idea, by issuing a warning for unknown keys, and once they are trusted and the new file's matches the old key, the upgrade will just be performed. On a key change, it should clearly state that the new key differs from the old one, and ask the user whether to trust the new one, and whether the old one should be removed.
I think that's a fairly secure trust model for AppImage, using some established structures, being not too complicated and easy to implement by users with our existing zsync based infrastructure.
TL;DR: Coming back to Dat, I don't think a web of trust will provide any real security to your users, for the reasons stated above. People should not ultimately rely on it, and for application deployment, where foreign code is supposed to be executed on others' machines, I would never ever rely solely on a Web of Trust. For static websites and other harmless contents, it might work to some extent, but thinking of a browser, when it comes to JavaScript, things get problematic again.
So, if you design a Web of Trust which is not subject to any of those issues, please make sure to notify us, because I'm really interested in the topic. If it'd fit our needs, I'll consider using it for AppImageUpdate, too!
We need to redefine the WoT away from how PGP defined it. The pure "human friends only" model is way too slow-moving, and the measure of transitive trust was a fairly limited form of graph analysis.
The new definition should be based on a set of features:
Cryptographic networks like Dat give a richer dataset to analyze. All interactions are logged in the network, and become signals for graph analysis. So, inconsistencies should be more detectable.
For instance, if multiple "Paul Frazees" start getting followed, a crawler should be able to notify me and I can react by flagging them. Then, as with any graph analysis, the computed trust is a matter of choosing good starting nodes (and doing good analysis).
For bootstrapping trust, we use auditable key distribution nodes, which ought to be the job of orgs and services. We can use auditable contract systems like nodevms to back these kinds of servers. They will then use CAs to identify themselves. So, again: a combination of CA-secured channels and app-generated trust signals.
Direct in-person signatures could still be used, perhaps initially only for high-risk tasks like software distribution. That would be the sort of thing where the user accounts of the org and devs have published special "trust" objects on Dat, which are in turn used by software-installers.
But-- that question is basically pushed into application space, since any app can decide how to do its trust analysis on top of the crypto networks. So, perhaps instead of calling it a Web of Trust, we need to think of it as a "Trust Platform," because we're putting trust signals into the application space as a primitive to work with.
Regarding the risk of the attack window, with any automated decision based on trust, such as installing software, there's always the option of putting in a time delay. "This software must be published for 24 hours with no 'withdrawal' signals from X set of users before being installable."
What I mean with "web of trust" is really not specific to applications but I guess has been/needs to be solved for a peer-to-peer Web browser as well. After all, an AppImage is just a file, like a HTML file. In both cases I want to have certainty that what claims to be coming from, e.g., @pfrazee (just standing in as an example here), is actually coming from @pfrazee and has not been altered in between - be it a HTML page or an AppImage. The more difficult question is whether @pfrazee can be trusted - be it with information or software originating from him. An indication may be who else is following him.
So in summary, I believe a peer-to-peer Web browser needs to address the very same questions somehow, and if they are adequately solved for Web browsing, then we can also use the very same concepts for software distribution.
Agree?
I believe a peer-to-peer Web browser needs to address the very same questions somehow, and if they are adequately solved for Web browsing, then we can also use the very same concepts for software distribution.
@probonopd I think that's exactly right.
You're right, it's probably better to avoid calling this "Web of Trust", as I guess many people associate PGP's model with that term. I have to admit I'm not too much into blockchain technology or stuff like smart contracts which are built on top of it.
All this sounds quite interesting, but also far from being mature right now, unfortunately. Is there a roadmap, set of definitions or specifications or any other data where interested people could get informed about your plans?
I'll be thinking about what you said about the trust model that an application scenario like "app update distribution" could put on top of it. I see what you mean with the withdrawal signals, but I couldn't imagine how to realize that, since there's a paradox: You don't want to publish updates until a "crowd-sourced" trust has been reached, but how would that be possible without pushing updates to at least some users? A/B like testing might work, but the e.g., 10% of users, who would receive the update right away are put at an unacceptable risk of getting malware on their systems (they might not even be able to push a withdrawal request into the network, depending on the effects of the malware).
Right now I'm not 100% sold of the concept, but I'm confident that a constructive discussion might lead to a working model. If you could point me to a place where you discuss those things, I'll have a look as soon as possible.
By the way, I think it might be worth to talk to some bigger projects like openSUSE, too, who provide trustworthy AppImages (they sign the AppImages they publish with their pubkeys, so they might be a reasonable institution to "seed trust" in the network.
All in all, Dat and Beaker sound interesting for distribution right now, but I'd leave aside its web of trust when implementing it in AppImageUpdate, I'd rather continue to use a more conservative trust model like the one I suggested.
By the way, I'd like to invite you into our IRC channel, #AppImage on Freenode.
What establishes trust today?
What might establish trust in the future?
All this sounds quite interesting, but also far from being mature right now, unfortunately. Is there a roadmap, set of definitions or specifications or any other data where interested people could get informed about your plans?
No, this is just a set of ideas we're forming as we build with dat & beaker. I agree that it's too early to go into production-mode with using a new trust model on top of Dat. I think Dat's a great protocol to distribute images, but I'd still use existing code signature techniques on top of using Dat.
You don't want to publish updates until a "crowd-sourced" trust has been reached, but how would that be possible without pushing updates to at least some users?
That's not I'm suggesting there. You'd already have a trust network established for the release: that is, the pubkeys you trust to publish or withdraw a release. The purpose of the delay would be to give the owning orgs a chance to notice and react to a compromise in those trusted actors.
So, a simple example scenario that could work right now: you have an app you build, and the .appimage is signed by your dev laptop (1 sig). Somebody steals your laptop and publishes a compromised version. If there was a 24 hour delay before clients auto-download the update, that'd give you time to access the .appimage host and take down the bad version.
Same idea here.
By the way, I'd like to invite you into our IRC channel, #AppImage on Freenode.
Joined!
I wrote an article a while back, when I was working on SSB, that tried to summarize a lot of reading I did on trust and reputation analysis. It's overly dense, but the research I linked to was good http://ssbc.github.io/docs/articles/using-trust-in-open-networks.html
Reacting to some of your points @probonopd
HTTPS & DNS do have the problem you mention -- you can phish using "close enough" domain names. It happens pretty frequently.
Graph & reputation analysis - The issue of "SEO gaming" is true. The Advogadro project (see my article) had decent success. It depends on the usecase; if false positives/negatives are dangerous, then you can use graph analysis more as a suggestion.
Stars & user signals - If you filter the stars/signals by "people you follow" or "people in your network" or some similar tool, you improve the value of that signal, but lose potentially good source that you're just not connected. This is why you might want a single node to try to globally crawl and rank everybody -- they can potentially tell you which stars to trust and which ones not. How? Basically, what you're doing is having the crawler try to define the "best people in my network," and then use that set to filter signals such as stars (and therefore cut out the spam). Again, check out advogadro or pagerank (in my article). Graph analysis is a way to expand your network of trust without having to manually evaluate each new connection.
Someone already had mentioned this idea 12 years ago in an article about klik (AppImage predecessor):
it's a good idea to integrate a p2p network on it, such as bittorrent, so that once it's popular, the servers aren't down because of too much people downloading, or you start getting problems of connection. It would be nive to kind of force people using p2p in this case.
This implies:
Written in Golang, which means one single binary runs without much hassle pretty anywhere.
There is even a C implementation: https://github.com/Agorise/c-ipfs
To be investigated: Just running the ipfs daemon without using it seems to significantly slow down other download traffic on the machine/in the network.
'/home/me/Downloads/go-ipfs/ipfs' init
'/home/me/Downloads/go-ipfs/ipfs' daemon
Of course, appimaged
would do this automatically if it detects ipfs
is on the $PATH
and/or is a running process.
/home/me/Downloads/go-ipfs/ipfs add -q '/isodevice/Applications/AppImageUpdate-8199a82-x86_64.AppImage' | tail -n 1
QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
# Everyone who would add the exact same version of `AppImageUpdate-8199a82-x86_64.AppImage` would get the exact same hash
# TODO: Find out how the hash is calculated
http://localhost:8080/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
https://ipfs.io/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB <-- global link
Works! But only as long as the machine is online. To change that:
http://ipfsstore.it/submit.php?hash=QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
This will store it for 30 days and longer if someone sends BTC to the address displayed.
Now, to make this into a redundant cluster, we could set up https://github.com/ipfs/ipfs-cluster/ - since one can set up redundancy and automatic replication, we could probably use the cheapest hosting we can find...
Range requests are apparently supported: https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Byte_serving.html
_ipfs-discovery._udp
is already implemented for looking up other ipfs daemons on the local network, https://github.com/ipfs/go-ipfs/issues/520. Code: https://github.com/libp2p/go-libp2p/blob/e4966ffb3e7a342aaf5574d9a5c0805454c07baa/p2p/discovery/mdns.go#L24
It is not used to announce files on the LAN, however (we would need to do this ourselves).
https://ipfs.io/blog/17-distributions/ says:
It may also make downloading new versions much faster, because different versions of large binary files often have lots of duplicated data. IPFS represents files as a Merkle DAG (a datastructure similar to a merkle tree), much like Git or BitTorrent. Unlike them, when IPFS imports files, it chunks them to deduplicate similar data within single files. So when you need to download a new version, you only download the parts that are new or different - this can make your future downloads faster!
So, it looks like that while we can continue to use zsync it may not even be needed?
Asked for opinions re. intelligent chunking for better deduplication on the IPFS forum, https://discuss.ipfs.io/t/ipfs-for-appimage-distribution-of-linux-applications/1553
On IRC #ipfs, someone pointed out:
probono > Could we have IPFS do the chunking of the Live ISO's squashfs based on the individual files that make up a Linux Live ISO? (Or AppImage) whyrusleeping > kinda like the tar importer probono > whyrusleeping: with the tar importer, can i get the "original tar" back out of the system? probono > which a matching checksum? whyrusleeping > probono: yeah, with the tar export command
Similar: https://github.com/ipfs/go-ipfs/issues/3604
appimaged
execs ipfs add -q 'Some.AppImage'
if it is on the $PATH
QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
on the local network with Zeroconf (probably in a JSON feed together with some metadata such as the filenames etc.)QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
as well and puts it into a custom header like X-ipfs-hash
X-ipfs-hash
and when having ipfs
on the $PATH
, tries downloading from http://localhost:8080/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB; else downloads as usual; if that fails, downloads from https://ipfs.io/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKBhttps://github.com/ipfs/faq/issues/59
Viable? Pros? Cons? https://github.com/TheAssassin/zsync2/issues/15
To be written
Written in nodejs, which means npm
and friends are needed to set it up.
A C library is still in a very early stage: https://github.com/mafintosh/libdat
From https://docs.datproject.org/faq:
How is Dat different than IPFS?
IPFS and Dat share a number of underlying similarities but address different problems. Both deduplicate content-addressed pieces of data and have a mechanism for searching for peers who have a specific piece of data. Both have implementations which work in modern Web browsers, as well as command line tools.
The two systems also have a number of differences. Dat keeps a secure version log of changes to a dataset over time which allows Dat to act as a version control tool. The type of Merkle tree used by Dat lets peers compare which pieces of a specific version of a dataset they each have and efficiently exchange the deltas to complete a full sync. It is not possible to synchronize or version a dataset in this way in IPFS without implementing such functionality yourself, as IPFS provides a CDN and/or filesystem interface but not a synchronization mechanism.
Dat has also prioritized efficiency and speed for the most basic use cases, especially when sharing large datasets. Dat does not make a duplicate of the data on the filesystem, unlike IPFS in which storage is duplicated upon import (Update: This can be changed for IPFS too, https://github.com/ipfs/go-ipfs/issues/3397#issuecomment-284337564). Dat's pieces can also be easily decoupled for implementing lower-level object stores. See hypercore and hyperdb for more information.
In order for IPFS to provide guarantees about interoperability, IPFS applications must use only the IPFS network stack. In contrast, Dat is only an application protocol and is agnostic to which network protocols (transports and naming systems) are used.
Wouldn't it be cool if e.g., all AppImages containing Qt could deduplicate data? Check https://github.com/ipfs/notes/issues/84 where it talks about deduplication.
Coud probably be decentralilzed in a database as well, e.g., using https://github.com/orbitdb/orbit-db/blob/master/API.md
@probonopd the chunking we talked about in irc could be pretty useful here. As a quick hack I would be interested to see what sort of deduplication you get across different images using rabin fingerprinting: ipfs add -s=rabin
. This uses content defined chunking and should ideally produce a better layout that the default fixed width chunking (at least for this usecase).
If you add two different files with the rabin fingerprinting, you could do ipfs refs -r <hash>
on each (which lists each block) and see how many hashes are the same between each file.
@whyrusleeping: If that indeed would work, it would be a pretty cool feat!
@KurtPfeifle do you have a list of images somewhere I could download and try this out on?
@whyrusleeping:
A list of crowd-sourced AppImages and their respective download locations is here:
...and here are LibreOffice AppImages in quite a few different combos: old releases, recent releases, daily/nightly builds -- all with various localizations enabled and combined:
Downloaded a random sampling of images:
why@whys-mbp ~/appimagetest> ls -l images/
total 4408472
-rw-rw-rw-@ 1 why staff 25189240 Dec 3 19:08 KeePassXC-2.2.2-2-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 209290936 Dec 3 19:29 LibreOffice-6.0.0.0.beta1-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 333555384 Dec 3 19:28 LibreOffice-6.0.0.0.beta1.full-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 248202936 Dec 3 19:27 LibreOffice-6.0.0.0.beta1.standard-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 372242104 Dec 3 19:29 LibreOffice-6.0.0.0.beta1.standard.help-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 206177976 Dec 3 19:26 LibreOfficeDev-6.0.0-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 206177976 Dec 3 19:27 LibreOfficeDev-6.1.0.0.alpha0_2017-11-26-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 206177976 Dec 3 19:26 LibreOfficeDev-daily-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 27469496 Dec 3 19:08 Qt_DAB-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 28583608 Dec 3 19:09 Woke-de0a968-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 11702008 Dec 3 19:09 XChat_IRC-5d0dbe1-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 63766528 Dec 3 19:08 alduin-2.0.1-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 68878336 Dec 3 19:22 draw.io-7.7.3-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 52363264 Dec 3 19:21 vessel-0.0.9-x86_64.AppImage
-rw-rw-rw-@ 1 why staff 109641728 Dec 3 19:22 wire-3.0.2816-x86_64.AppImage
Then added them to a clean ipfs directory, and got the following results:
why@whys-mbp ~/appimagetest> ipfs add -r -s=rabin images/
... ipfs adding things ...
why@whys-mbp ~/appimagetest> du -h -d0 ipfs/
1005M ipfs/
why@whys-mbp ~/appimagetest> du -h -d0 images/
2.1G images/
Cutting the 'average block size' target in half to 128k results in slightly better results, but the cost you pay will be in slightly higher transfer times.
why@whys-mbp ~/appimagetest> ipfs add -r -s=rabin-128000 images/
... ipfs adding things ...
why@whys-mbp ~/appimagetest> du -h -d0 ipfs2
947M ipfs2
The results were probably better than average due to the number of libreoffice images i included, but still pretty nice.
Also, for context, the default ipfs chunker:
why@whys-mbp ~/appimagetest> du -h -d0 ipfs4
1.4G ipfs4
Looks like these three LibreOffice downloads could even be exactly the same files, despite their different names:
-rw-rw-rw-@ 1 why staff 206177976 Dec 3 19:26 LibreOfficeDev-6.0.0-x86_64.AppImage -rw-rw-rw-@ 1 why staff 206177976 Dec 3 19:27 LibreOfficeDev-6.1.0.0.alpha0_2017-11-26-x86_64.AppImage -rw-rw-rw-@ 1 why staff 206177976 Dec 3 19:26 LibreOfficeDev-daily-x86_64.AppImage
Which would skew the results even more. But still pretty good!
@KurtPfeifle ah, great catch. let me remove those from the sample.
Two of them were the exact same file, the other was likely very similar:
88d17b863625b08eda45723dae81a866020b7615 images/LibreOfficeDev-6.0.0-x86_64.AppImage
5d785c15ab1e989e8b2605f67768527578ace031 images/LibreOfficeDev-6.1.0.0.alpha0_2017-11-26-x86_64.AppImage
88d17b863625b08eda45723dae81a866020b7615 images/LibreOfficeDev-daily-x86_64.AppImage
In #ipfs IRC, I mentioned that you could implement a custom chunker for AppImage files that would intelligently break the file up on known internal file boundaries as a way of maximizing deduplication. The internal ipfs interface for this looks like this which basically wraps a stream of data and provides a way for the caller to read it a chunk at a time (with whatever underlying logic you want). If you don't fancy writing go, you can write your chunker in whatever language you like, and build the ipfs graph manually via the api. Can someone link me to documentation on the AppImage format?
The AppImageSpec is here: https://github.com/AppImage/AppImageSpec.
Its main content/payload is a SquashFS-compressed AppDir structure of files, which is prepended by a small binary called "AppRun".
You can invoke type 2 AppImages like any.Appimage --appimage-help
, which will tell you that --appimage-extract
will extract the payload into the original AppDir structure (currently this will land in a directory named squashfs-root
).
@whyrusleeping:
"Two of them were the exact same file, the other was likely very similar"
I'm wondering that the chunker did not realize that the complete files were identical and report it somehow?
What are the respective numbers for the tests (after removing copies) you reported about in comment further up?
It did realize, I just didnt notice. See here:
why@whys-mbp ~/appimagetest> ipfs add -r images/
added QmSxi8GJRZeA5U9g4KZvJLAtyZ9iRMGtu1zjC2rPMYFrQ3 images/KeePassXC-2.2.2-2-x86_64.AppImage
added QmQBBEbx32uEAgTKLA1aXQJKFTVmkv1qdeuAEtwuDfgVbT images/LibreOffice-6.0.0.0.beta1-x86_64.AppImage
added QmNdXpac5RysqLYj51Px2ihFBSVZF7yaUop7f8vBKZZNMy images/LibreOffice-6.0.0.0.beta1.full-x86_64.AppImage
added QmaoEbTvrnzTgZCJU9c5CMvnKSUeDFGmEsV7cMciXvs4ue images/LibreOffice-6.0.0.0.beta1.standard-x86_64.AppImage
added QmXA8HarASPeEEWtn6Xq6MRzrRuqHjkqcGtvGcQhRQJMJB images/LibreOffice-6.0.0.0.beta1.standard.help-x86_64.AppImage
added QmdvcixcM86TePK8BQNQeA4Qfd9fnosaJpUwPaBPifkUmY images/LibreOfficeDev-6.0.0-x86_64.AppImage
added Qmatc8i6sqfEfSXqzowGComcbSi3rXjvAfRGP2pVxwH2rQ images/LibreOfficeDev-6.1.0.0.alpha0_2017-11-26-x86_64.AppImage
added QmdvcixcM86TePK8BQNQeA4Qfd9fnosaJpUwPaBPifkUmY images/LibreOfficeDev-daily-x86_64.AppImage
added QmaLZdCrxgtDmosad5RLVyNkhWXkETFaQBjSinyoG8x9Y2 images/Qt_DAB-x86_64.AppImage
added QmVyBQiDmozp49h3F2yVC5AHJcPTfywyeaUXFQPmhuzq4C images/Woke-de0a968-x86_64.AppImage
added QmdDD7VUdmraqrCsrJokDGLVDMD1SCRDakUinGfAmAzCim images/XChat_IRC-5d0dbe1-x86_64.AppImage
added QmWFzH1vuZpbZa8Qr38Pzt8AyakifthBByKd68UYtgbFev images/alduin-2.0.1-x86_64.AppImage
added QmbacmHuFDccQ2emSAKhYKQzwGrYeFSLXiUn6ydLuj81yi images/draw.io-7.7.3-x86_64.AppImage
added QmabkdT6QaWVEcNc2VQXGHKr2fVwsZRtppy7XGow29v5BZ images/vessel-0.0.9-x86_64.AppImage
added QmbbsV2VSCPjuS8AGRC9Po2n6sAreEQAkjBv43vEsUF69e images/wire-3.0.2816-x86_64.AppImage
added QmcWoJBRet9rvZKGHen9WCrRTRdUKZKvMSLLZfr4PnQMM5 images
The hashes of those two files are the same (ending in UmY). which means they are only stored once.
@whyrusleeping
In #ipfs IRC, I mentioned that you could implement a custom chunker for AppImage files that would intelligently break the file up on known internal file boundaries as a way of maximizing deduplication.
Indeed, this sounds like the right thing to do.
The internal ipfs interface for this looks like this which basically wraps a stream of data and provides a way for the caller to read it a chunk at a time (with whatever underlying logic you want). If you don't fancy writing go, you can write your chunker in whatever language you like, and build the ipfs graph manually via the api.
An AppImage is basically a squashfs filesystem image prepended by a small ELF executable (which mounts the AppImage using FUSE when it is executed, and runs the application contained in the squashfs filesystem).
As for the squashfs filesystem itself, we currently are using Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
. That is probably also not ideal because the blocks do not fall on the boundaries of individual files.
We are not married to that, in fact we could but could use other squashfs variants using zlib, lz4, or xz and have been considering a switch to Zstandard compression for the squashfs filesystem.
To complicate things a bit, we are also already doing binary delta updates over HTTP(S) using zsync, which also works by chunking and checksumming the chunks. And uses different block sizes...
Reminds me of this:
Handling for compressed files. rsync is ineffective on compressed files, unless they are compressed with a patched version of gzip. zsync has special handling for gzipped files, which enables update transfers of files which are distributed in compressed form.
We have never come around to understand and/or use this so far in zsync2.
So in short, we could really need the help of someone who understands these things better than us.
Can someone link me to documentation on the AppImage format?
Answer above updated with details @whyrusleeping.
wish a solution support http direct download and p2p download as well , so if when p2p speed download slowly, it can auto-switch to http download
Point in case:
Cross-reference: https://github.com/probonopd/uploadtool/issues/28
On IRC:
"We now have four node servers in Hongkong, Shanghai, Beijing, Singapore, to speed up the download, and it can become a IPFS node immediately."
So, we need to find a way to tell these servers which AppImages to download and pin. What is the best way to do this?
First Idea: As part of the automated quality control we do on AppImageHub, calculate the IPFS hash. Then the cluster could pin these hashes. Not the best idea. Because that way we would have to run this on every version of every AppImage.
Second idea: So we need to find a way for the person who generates an AppImage (or a new version of it) to submit a permalink (=IPNS hash that always points to the latest version) to AppImageHub. AppImageHub would then get the AppImage from there, and if it passes validation, store the IPNS hash in a list that the IPFS cluster could pin.
Third idea: Can we use a p2p database for this and replace the central checking at AppImageHub with something distributed?
@probonopd The difference between First Idea and Second Idea is that ipfs-publishing is just opt-in in the second Idea, while it happens for everyone on the other case, or is it more subtle? Perhaps you mean that only certain versions (say marked as stable) are meant to be pinned?
If I get it correctly, it would seem that AppImageHub needs to fetch the AppImage and validate in any case.
Maybe you can use ipfs pubsub
(https://ipfs.io/blog/25-pubsub/) so that everyone can whisper new appimage hashes mean to be pinned. Once validated you can run ipfs-cluster to pin things in multiple servers. You can do it without ipfs-cluster too, but it provides a nice layer for maintaining a pinset in multiple locations.
Or you could run something similar to our IRC pin-bot like we do (https://github.com/ipfs/pinbot-irc) as the interface for people to submit new hashes.
@probonopd The difference between First Idea and Second Idea is that ipfs-publishing is just opt-in in the second Idea, while it happens for everyone on the other case, or is it more subtle?
Well, in theory we would like to have an entirely peer-produced, web-of-trust based solution for publishing "known good" AppImages. As long as such a solution does not exist, we would run AppImages though a (centralized) test and publish a list of known-good AppImages (or their hashes) from there.
Thanks for pointing to ipfs pubsub
.
On a related note,
Cloudflare goes InterPlanetary - Introducing Cloudflare’s IPFS Gateway
Endless OS (a Debian-based distribution) is using a combination of OSTree, Flatpak, and Avahi to realize "peer-to-peer updates": https://github.com/endlessm/eos-updater
Played a bit with ipfs today and it seems rather resource intensive, and a lot of steps are needed like:
go-ipfs/ipfs add -w -n --nocopy -q -s=rabin-128000 -r -- /some/directory # High CPU usage for a long time
QmWv...
# Need to pin them
go-ipfs/ipfs pin add QmWv...
# Need to find a way that new files that get added to the directory
# get shared automatically. Unfortunately I am not an IPFS expert at all
# (need to read some docs)
This may be important if you want to share your whole Applications directory, including new files that may be added to it all the time:
https://lwn.net/Articles/763492/
With dat, to share data, you basically only need to call dat share and you're done: that creates a magic URL and no data is moved around. History can be kept using an external archiver, which means data is duplicated, but it's not the out of the box behavior.
With IPFS, to share data, you would call ipfs add which will copy each file (or its chunks, I don't quite remember) to ~/.ipfs to be globally reference by the ipfs daemon. There's the filestore extension to workaround that, but it's not enabled by default. Furthermore, it's not clear to me changes in the original dataset are automatically tracked the same way they are in dat.
Galacteek does something scarily clever, it's the first self-seeding AppImage:
Hats off to you @eversum. You've beaten us to it ;-)
Now imagine if we had a mechanism for this properly built into the AppImage ecosystem... for all applications (that allow sharing due to their license terms)...
@probonopd Thanks!
The idea came up after distributing the AppImages "the standard way" and finding it inadequate. Bundling go-ipfs in the AppImage gave me other ideas including using the filename as CID to enable self-seeding by using IPFS pinning.
I've just updated the AppImage.
Investigate https://lbry.tech/ as a means for distributing software in a decentralized way.
@antony-jr has a working proof-of-concept implementation ready :+1:
https://github.com/AppImage/zsync2/issues/24
Reference: https://twitter.com/probonopd/status/1320054292275933184
AppImage is all about easy and secure software distribution for Linux right from the original upstream application author directly to the end user, without any intermediaries such as Linux distributions. It also supports block-based delta binary updartes using AppImageUpdate, allowing for AppImages that can "update themselves" by using information embedded into them (like Sparkle Framework for macOS). Consistent with this vision, we would like to enable peer-to-peer based software distribution, so that we would not need central hosting (such as GitHub Releases, etc.) while ideally maintaining some notion of a "web of trust" in which it is clear who is the original author of the software, and that the AppImage is distributed in the way the original author wants it to be distributed.
In this ticket, let's collect and discuss various peer-to-peer approaches, that could ideally be woven into the AppImageUpdate system as well.
"IPFS is the Distributed Web. A peer-to-peer hypermedia protocol to make the web faster, safer, and more open." https://ipfs.io
Should we use it to distribute AppImages?