deltachat / deltachat-core-rust

Delta Chat Rust Core library, used by Android/iOS/desktop apps, bindings and bots 📧
https://delta.chat/en/contribute
Other
683 stars 87 forks source link

Use zip archive format for exporting & importing #1727

Closed hpk42 closed 4 years ago

hpk42 commented 4 years ago

A zip archive export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:

When introducing zip files for export/import we will need to continue importing from the old format for a longer time.

link2xt commented 4 years ago

@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all.

dignifiedquire commented 4 years ago

I would suggest to use brotli, which has nice implementation in rust here: https://crates.io/crates/brotli. It was developed specifically for the web and means it compresses various text based documents like html considerably better than gzip (https://medium.com/oyotech/how-brotli-compression-gave-us-37-latency-improvement-14d41e50fee4). This will also become useful if we decide to store full content html emails, as we can reuse the dependency for compressing those.

csb0730 commented 4 years ago

@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all.

This is not correct: We don't load all blobs into memory! Only the current file to copy!

Regarding the whole import/export process: There is no change regardless what file format you use, sqlar or zip.

dignifiedquire commented 4 years ago

Comparison gzip vs brotli and an existing dc backup

csb0730 commented 4 years ago

A zip export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:

* faster and streamable: export can start immediately, there is no need to copy blob files into the db (storage problems)

I don't know why zip should be faster? Maybe You can explain?

Please see my comment here in former issue: https://github.com/deltachat/deltachat-core-rust/issues/1724#issuecomment-660062996

And maybe a general consideration:

The development around the backup issue is very welcome and I'm very happy that this discussion starts and is evolving so quick. But please consider all aspects deeply before going some way.

dignifiedquire commented 4 years ago

For comparison also added lz4

dignifiedquire commented 4 years ago

Note brotli is more expensive than gzip in compression, but cheaper in decompression.

hpk42 commented 4 years ago

sorry, i don't understand. ZIP is an archive format and brotli is a compression format. Can you use brotli with zip? If not, then i fail to see how it relates to the issue.

csb0730 commented 4 years ago

Even the final compress ratio is not the key element/feature/benefit of a backup. The technical possibilities (and needs) itself are very important IMHO.

dignifiedquire commented 4 years ago

ZIP is an archive format

Oh, I thought you were talking about compression, and just forgot to add the g for gzip. Not sure why there is so much discussion about the archive format /me confused

dignifiedquire commented 4 years ago

Note, zip is not just an archive format it does both compression and archival. tar is a pure archival format. Ref: https://superuser.com/a/1257441/147739

link2xt commented 4 years ago

Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed.

hpk42 commented 4 years ago

@dignifiedquire -- i changed the title and the initial issue description to clarify this is about "zip archive". Sorry for the confusion. We could also consider tar+brotli i guess but i feel at least on windows and phones there will not be much support for dealing with it -- people now ".zip" files and usually have an idea how to handle them if they want to.

csb0730 commented 4 years ago

A last comment: Maybe I'm wrong but I'm quite shure when all aspects or needs and traits of the file formats are on the table the sqlite database will be in another light?

link2xt commented 4 years ago

when implementing network-based multi-device setup (export to network & import from network), the import can directly stream the zip file into the app state, requiring only the size of the zip on the phone/device.

ZIP files store directory in the end, so you can't start import until you get the whole file. ZIP is only streamable in a sense that you can start export immediately, but import will only start at the end of download.

dignifiedquire commented 4 years ago

Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed.

They are? compressing them all together could yield quite the benefit, vs individual compression

@hpk42 but users are supposed to import this file not go work with it manually, so I am not seeing the benefit of having good windows support

dignifiedquire commented 4 years ago

Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised.

link2xt commented 4 years ago

zip files allow AES encrypting the contents which is not perfect but better than nothing

We can also PGP-encrypt whatever format we use, it can also work if the key is already transferred via autocrypt setup message. ZIP AES encryption is using a password, which is not that useful to us.

link2xt commented 4 years ago

Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised.

I agree, we don't have a trusted channel if we are transferring a backup over the LAN. The simplest way is to transfer a checksum/key over the second channel (QR code) and simply transmit the backup file over unencrypted TCP connection. If we want to stream, we need to authenticate each packet by using TLS maybe. It is more complicated, and since we can't read ZIP file until we receive the end of central directory, it does not matter much.

link2xt commented 4 years ago

Overall, I don't see many advantages now compared to sqlar. And there is an added dependency of zip-rs.

Also compatibility is easier with sqlar, because both formats are SQLite databases. Just open it and then check whether it is an sqlar format or not. If we go with zip, we need to also change how we search for the file.

link2xt commented 4 years ago

Tar is streamable both on export and import and has an async library by @dignifiedquire: https://github.com/dignifiedquire/async-tar

hpk42 commented 4 years ago

thanks all for the discussion, i moved things to a new issues, addressing various comments here. https://github.com/deltachat/deltachat-core-rust/issues/1729