Closed bwin closed 1 month ago
@paulcbetts I don't know if you're using asar, but since I see you on nearly every atom-shell issue, maybe you got an opinion on this, too.
We're not using Asar at the moment, we probably won't end up doing it (though it would probably improve install times quite a bit on Windows).
Ok, thanks.
Paul Betts notifications@github.com schrieb am So., 21. Dez. 2014 20:03:
We're not using Asar at the moment, we probably won't end up doing it (though it would probably improve install times quite a bit on Windows).
— Reply to this email directly or view it on GitHub https://github.com/atom/asar/issues/16#issuecomment-67780588.
I'm :+1: on adding magic number and checksum.
If you're (@zcbenz) ok with the idea, how about adding a checksum too? (When you do incompatible changes do them all at once.) In normal operation the checksum can be ignored, but it could be used to check the integrity of the file when someone wants to. It would also be possible to put the checksum into the json-header, but I don't like that.
I'm more willing to put the checksum in JSON header. I want the format itself to be as simple as possible, I think if a feature can be achieved without making the format more complicated, then we should keep the format unchanged.
BTW is there a reason that the header (json-filelist) gets pickled and not just written to the file? Is this faster or better in any way?
Pickle
makes implementation simpler and safer, it takes care of compatibility between different platforms and checks whether you are reading the correct data, and it has a nice API to read/write strings.
@zcbenz Ok, just a quick question: What about ditching asar and using zip files instead? I think the answer is no, because of #673 (asar for runtime-mode). Is it?
I changed my mind (after playing around with bwin/asar-util) about a few things.
version
field in the root of the json header. This is the file format version.How to parse?
This is what it would look like
@zcbenz Ok, just a quick question: What about ditching asar and using zip files instead? I think the answer is no, because of #673 (asar for runtime-mode). Is it?
Using zip would make our code much more complicated, there are many other reasons but this is the biggest one.
I changed my mind (after playing around with bwin/asar-util) about a few things.
I really like your new design :+1:.
IMO generating this checksum is an uneccessary PITA because you have to null the checksum entry, to avoid including the checksum in generating the checksum.
You are right, I'm down to put the checksum as part of the format.
Hmm, why do we need a ARCHIVE-SIZE
in header? Is it just the file size?
And did you mean size
field will be UINT64
while keeping it less than 1<<53?
Thanks for the quick reply.
The archiveSize isn't in the header. It's at the end of the file archive (as in #4).
And yes, i would serialize the size as UINT64 while keeping it below 1<<53. It's still a valid UINT64, but we error out on the higher values.
The archiveSize isn't in the header. It's at the end of the file archive (as in #4).
Ah yeah, I forgot about that.
And yes, i would serialize the size as UINT64 while keeping it below 1<<53. It's still a valid UINT64, but we error out on the higher values.
:+1:
You are right, I'm down to put the checksum as part of the format.
I'm not 100% into my plan either. If we make it part of the format, we cannot change it to a longer checksum algo without breaking our format. Or we don't include it in our offset calculation (via a separate headerOffset), but we still don't know the length and algo used. So this info would go into the json header. Doesn't sound good to have it all over the place either.
I'm not sure about setting the checksum algo in stone. Or do we admit that it's just a basic check that the content isn't random garbage and say a MD5 (or SHA1 or whatever) will suffice forever (for what we want to provide)? What do you think?
A basic check is enough, let's just use the fixed length checksum, I don't think we need to worry too much about the future.
In asar-util I have implemented simple compression for files with gzip streams.
"some-file.txt": {
offset: 1000,
size: 5000, // decompressed size
csize: 1234 // compressed size
//comp: "gzip" or something to support different types of compression??
}
Are you willing to allow compression of file entries (on the other side)? Regarding file size this would make a lot of sense, considering the expected content (a lot of text). About slower loading when using compression. It would always be a choice (meaning you could always create an uncompressed asar). Or do you prefer compressing the asar for download (as zip archive or just through gzip) and keep the format without compression, trading file size for speed. Would be worth measuring if you're not sure if it's worth the trouble.
Are you willing to allow compression of file entries (on the other side)?
I prefer letting the users compress the asar
archives themselves, e.g. don't allow compression of file entries.
As far as I can see, the asar
archives can be used:
tar
In the first case the whole app will be compressed when distributing so we don't need to compress asar
archive alone. In the second case we can just compress the whole archive alone for download, and decompress it when installing, so no need to compress individual file entries. In the third case the archive can be compressed like tar
, as asar.gz
or asar.bz2
.
A basic check is enough, let's just use the fixed length checksum
What should it be? MD5? SHA1? Something fancy?
About compression (examples from asar-util):
In the third case the archive can be compressed like tar, as asar.gz or asar.bz2.
I don't see a use case for 3. But at first I liked the idea to keep my apps as .gz on disk. But this would (almost) always involve temporary files when running apps from. We could provide random access to all files in contrast to .gz or *.bz2 I think more of "after 2". If you have the runtime and (possibly a lot?) apps on your drive. It's still better than having to bundle the runtime with each, but dependencies can make an app quite big.
We should also think of a way to do efficient updates. In my app I used 3 different asar's for app, assets, dependencies to make updating more granular (which matters a lot, if you've changed just 2 lines of code and don't have to deploy 50MB of dependencies. I still do have to deploy the appcode or dependencies as a whole, so that isn't perfect either). But I see, that even this is not viable for the runtime version, it needs to be one file.
What should it be? MD5? SHA1? Something fancy?
I don't really have a preference, maybe just SHA1 since there is a very simple function in Chromium to compute SHA1 checksum.
But at first I liked the idea to keep my apps as .gz on disk. But this would (almost) always involve temporary files when running apps from. We could provide random access to all files in contrast to .gz or *.bz2
For downloaded apps I think we should always have decompressed archives, it is 2015 now the disk space should not be a problem. Even for an app as complicated as Atom it is still only 200~300mb decompressed.
As for the dependencies problem, I think we can have the runtime install and share the common dependencies instead of shipping the dependencies in apps.
Is there a chance to switch this project to coffee-script? (I know you don't just go around asking that, but since this is a rewrite anyway and the other atom projects are also using coffee-script...)
I'm fine with both JavaScript and CoffeeScript, I started with JavaScript because things were quite minimal, just go ahead if you want to convert everything to CoffeeScript.
I would prefer to serialize the header without chromium-pickle. It prepends 4 bytes (if I remember correctly), but since it's just text I don't really see why we would need that. We're only on little endian platforms anyway and I don't think we need chromium pickle at all.
We would have to write another C++ library for this if we don't use Pickle, because we have to read/write asar in both C++ and JavaScript. Pickle also has some nice features like simple type and length validation and String reading/writing, just a few bytes' cost is just fine in my mind.
I haven't really looked at this before, is atom-shell/atom/common/asar/*
everything that needs to be changed "on the other side"?
I haven't really looked at this before, is atom-shell/atom/common/asar/* everything that needs to be changed "on the other side"?
It should be, some changes may also need moderations of:
atom-shell/atom/common/lib/asar.coffee
atom-shell/atom/browser/net/asar/*
I'm also in favor of replacing header size format and dropping Pickle dependency.
Isn't node's buffer
api enough to read a number even in c++? It has writeUInt32{BE,LE} and you can get endianness as require('os').endianness()
@zcbenz, idea is not to drop Pickle from atom-shell, (although it would be cool too), but form at least asar packager, so users don't need to compile native modules just to pack their apps
Proof of concept: gist All tests are passing and I was able to read my app's package, made by original packager with pickle. I can do clean-up and PR if you are ok with general approach. /cc @zcbenz
@YuriSolovyov I'm good replacing Pickle with a pure JavaScript implementation, I think a decent solution is to implement the same interface of chromium-pickle with JavaScript, and then replace chromium-pickle with it.
@zcbenz this one was for asar cli part, I'm not sure about atom-shell part.
@YuriSolovyov The JavaScript implementation should be compatible with the C++ Pickle, so we don't have to rewrite in atom-shel.
yeah, it is, at least test were passing and I was able to list package directory of .asar
file created with "old" packager. can you please talke a look if it works for you? just replace one file that I gist'ed
question for better times: if we are about to make changes to packager, why not change format a little? is backwards compatibility that needed in this case? couple of bytes is not a big deal though.
So, after some trial and fail, we've got 2 things:
One question so far: can we just use node/io apis to do c++ stuff? (this also allows to drop some deps) I mean hey, it is already there, why not use it?
@YuriSolovyov There are some C++ code also using asar in atom-shell (like tray
module), so it is not possible to only use Node API.
@bwin I'm down to drop Pickle on both asar and atom-shell.
I assume tray needs it to display icon from asar package?
@YuriSolovyov Yeah, every API that needs an icon can read asar archives.
... and node fs module is patched only on js side?
Right.
Atm all that comes in mind will lead to code duplication of some sort:
C++:
JS: Most operations basically can be re-used from packager code, since it knows how to read from archive, this means asar packager could become dependency of atom-shell, just like other internal modules.
Thoughts?
The Chromium part and Node part of atom-shell are based on completely different C/C++ layers, so unfortunately we have to duplication code here.
Crazy idea: how about making atom executable work in packager mode if some flag is passed? like
atom.exe --create-package --path ./res --out app.asar
that would allow to write it once in atom-shell c++ layer and use both from c++ and js(via bindings)
As a user of atom-shell, if I may bring up a few points:
@joshuawarner32 isn't asar format platform-independent? I think asar created on linux should work just fine on all rest platform as well, and vice versa... if it is not, then we should make it so.
@YuriSolovyov well, technically (at least from reading the chromium-pickle code), it's not currently portable to big-endian architectures - but I don't think anybody cares, and that's not the issue here.
My issue is that, when building my app, I already manage download/extracting, modifying, and re-bundling of atom-shell builds for mac and windows. I don't want to add another layer of complexity in also managing downloading/extracting an atom-shell build for linux (my build platform). Also, while there are certainly workarounds, there actually isn't atom-shell build that works on debian wheezy, which represents the majority of our build slaves.
TL;DR: having a stand-alone tool to pack/unpack asars, independent of atom-shell, is really important to me. If you want to also integrate that functionality in atom.exe itself, far be it from me to stop you.
@joshuawarner32 well, that was just "crazy idea"
in this comment https://github.com/libarchive/libarchive/issues/1259#issuecomment-541777182 @kientzle rightly points that it is rather impractical to craft extraction of ASAR in libarchive that routes extraction based on content: there can be no magic without magic numbers ;) ... so is this really a dead ticket and proposal?
Closing out this old issue as the ASAR format is now stable. Checksum / integrity support now exists. Prepending magic bytes to the format is something that can still be discussed, if an interested party finds this issue feel free to raise a new issue outlining what would be ideal
Currently an asar archive consists of these parts:
I would like to add a magic number(/string) to the beginning like 'ASAR' or something similar. This would be backwards incompatible, but we could read it in the old format if it doesn't start with the magic string and show a deprecation notice for now. Why? Currently we would read any file, interpret the first 8 bytes as UINT64 and try to read that much bytes and try to interpret that as json. Although this works, IMHO a magic number (or file signature) would be a nicer way.
If you're (@zcbenz) ok with the idea, how about adding a checksum too? (When you do incompatible changes do them all at once.) In normal operation the checksum can be ignored, but it could be used to check the integrity of the file when someone wants to. It would also be possible to put the checksum into the json-header, but I don't like that.
I'm also in favor of #4, putting the filesize at the end. (This one doesn't break anything.)
BTW is there a reason that the header (json-filelist) gets pickled and not just written to the file? Is this faster or better in any way?
I'm also aware, that these changes need to be reflected in atom-shell to have any meaning at all.
It could look like this:
The position of HEADERSIZE and CHECKSUM could also be switched, I don't care.