Closed hpk42 closed 4 years ago
@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all.
I would suggest to use brotli, which has nice implementation in rust here: https://crates.io/crates/brotli. It was developed specifically for the web and means it compresses various text based documents like html considerably better than gzip (https://medium.com/oyotech/how-brotli-compression-gave-us-37-latency-improvement-14d41e50fee4). This will also become useful if we decide to store full content html emails, as we can reuse the dependency for compressing those.
@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all.
This is not correct: We don't load all blobs into memory! Only the current file to copy!
Regarding the whole import/export process: There is no change regardless what file format you use, sqlar or zip.
Comparison gzip vs brotli and an existing dc backup
194M
162M
167M
A zip export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:
* faster and streamable: export can start immediately, there is no need to copy blob files into the db (storage problems)
I don't know why zip should be faster? Maybe You can explain?
Please see my comment here in former issue: https://github.com/deltachat/deltachat-core-rust/issues/1724#issuecomment-660062996
And maybe a general consideration:
The development around the backup issue is very welcome and I'm very happy that this discussion starts and is evolving so quick. But please consider all aspects deeply before going some way.
For comparison also added lz4
167M
Note brotli is more expensive than gzip in compression, but cheaper in decompression.
sorry, i don't understand. ZIP is an archive format and brotli is a compression format. Can you use brotli with zip? If not, then i fail to see how it relates to the issue.
Even the final compress ratio is not the key element/feature/benefit of a backup. The technical possibilities (and needs) itself are very important IMHO.
ZIP is an archive format
Oh, I thought you were talking about compression, and just forgot to add the g
for gzip
. Not sure why there is so much discussion about the archive format /me confused
Note, zip is not just an archive format it does both compression and archival. tar
is a pure archival format. Ref: https://superuser.com/a/1257441/147739
Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed.
@dignifiedquire -- i changed the title and the initial issue description to clarify this is about "zip archive". Sorry for the confusion. We could also consider tar+brotli i guess but i feel at least on windows and phones there will not be much support for dealing with it -- people now ".zip" files and usually have an idea how to handle them if they want to.
A last comment: Maybe I'm wrong but I'm quite shure when all aspects or needs and traits of the file formats are on the table the sqlite database will be in another light?
when implementing network-based multi-device setup (export to network & import from network), the import can directly stream the zip file into the app state, requiring only the size of the zip on the phone/device.
ZIP files store directory in the end, so you can't start import until you get the whole file. ZIP is only streamable in a sense that you can start export immediately, but import will only start at the end of download.
Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed.
They are? compressing them all together could yield quite the benefit, vs individual compression
@hpk42 but users are supposed to import this file not go work with it manually, so I am not seeing the benefit of having good windows support
Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised.
zip files allow AES encrypting the contents which is not perfect but better than nothing
We can also PGP-encrypt whatever format we use, it can also work if the key is already transferred via autocrypt setup message. ZIP AES encryption is using a password, which is not that useful to us.
Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised.
I agree, we don't have a trusted channel if we are transferring a backup over the LAN. The simplest way is to transfer a checksum/key over the second channel (QR code) and simply transmit the backup file over unencrypted TCP connection. If we want to stream, we need to authenticate each packet by using TLS maybe. It is more complicated, and since we can't read ZIP file until we receive the end of central directory, it does not matter much.
Overall, I don't see many advantages now compared to sqlar. And there is an added dependency of zip-rs.
Also compatibility is easier with sqlar, because both formats are SQLite databases. Just open it and then check whether it is an sqlar format or not. If we go with zip, we need to also change how we search for the file.
Tar is streamable both on export and import and has an async library by @dignifiedquire: https://github.com/dignifiedquire/async-tar
thanks all for the discussion, i moved things to a new issues, addressing various comments here. https://github.com/deltachat/deltachat-core-rust/issues/1729
A zip archive export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:
When introducing zip files for export/import we will need to continue importing from the old format for a longer time.