Majored / rs-async-zip

An asynchronous ZIP archive reading/writing crate.
MIT License
135 stars 44 forks source link

Parallel Zip Stream support #115

Open dupeiran001 opened 11 months ago

dupeiran001 commented 11 months ago

I'm trying to zip a list of files from http stream into an archive.

As far as I know, the stream.bytes().next()await operation takes some times waiting for the required bytes from the server, so it'll be faster if it can be downloaded and zipped in parallel.

For example, I have 10 files to zip, it would be great to spawn 10 tokio thread. In each thread, some stream reading operation and stream zip into an entry is done. When all the thread finished generating ZipEntry, they can be collected into an ZipFile, such process is CPU busy, so it doesn't help much to support parallel collection.

It would be possible if EntryStreamWriter can be build in dependent with ZipFileWriter

But now I have to do some thing like this, as a EntryStreamWriter has to be build from a singleton ZipFileWriter:


    let mut zip_file_writer = ZipFileWriter::with_tokio(SOME_TOKIO_ASYNC_FILE);

    while let Some((name, mut stream)) = rx.recv().await {
        let entry_builder = ZipEntryBuilder::new(ZipString::from(name), Compression::Deflate);
        let mut entry_writer = zip_file_writer
            .write_entry_stream(entry_builder)
            .await
            .unwrap();

        while let Some(chunk) = stream.bytes().next().await {
            entry_writer
                .write_all(chunk.unwrap().as_ref())
                .await
                .unwrap();
        }
        entry_writer.close().await.unwrap();
    }
    zip_file_writer.close().await.unwrap();
Majored commented 10 months ago

It would be possible if EntryStreamWriter can be build in dependent with ZipFileWriter

Do you have any ideas on how this might be possible? The main issue is that writing is inherently sequential as each file has to be written in its entirety before moving on to the next. So I can't see how this could be implemented beyond wacking the EntryStreamWriter behind a Mutex, which can already be done.

The only part of the writing flow we could parallelise would be the compression but that also has some trade-offs.

dupeiran001 commented 10 months ago

The only part of the writing flow we could parallelise would be the compression but that also has some trade-offs.

Yes, I mean it, parallelise the compression seams more practical, so I'd like to know the trade-offs more detailed. Thanks you!