Compression level - Githubissues

bguise987 / pigz-python

The goal of this project is to create a pure Python implementation of the pigz project for parallelizing gzipping.

MIT License

31 stars 5 forks source link

Compression level #33

Open MuditMaurya opened 3 years ago

MuditMaurya commented 3 years ago

Hi, Is there a way we can define compression level in this implementation ? Something like compress_file('archive.tar',9) where 9 is the compression level for slow but best compression level. Thanks

bguise987 commented 3 years ago

Hi @MuditMaurya , yes this is possible. What you have posted here is actually correct for the implementation.

From pigz_python.py:

def compress_file(
    source_file,
    compresslevel=_COMPRESS_LEVEL_BEST,
    blocksize=DEFAULT_BLOCK_SIZE_KB,
    workers=CPU_COUNT,
):

MuditMaurya commented 3 years ago

That helps. Thank You very much. Also, I did observe that with best compression level pigz-python does not perform like how pigz performs on the shell. I had 105GB of files to compress, I used both the methods (pigz-python and pigz on shell) and Pigz-python created a larger compressed file as compared to pigz on shell. I must tell you that the environments were isolated and were exactly same for both the test.

Thanks

bguise987 commented 3 years ago

Hm yeah the original creator or pigz has noted this behavior to me as well previously. At this point I suspect it's because pigz-python isn't passing the compression dictionary around the way that pigz is setup to do. When I first released this I deemed that a "nice to have". :)

Would you mind sharing some more details with me about the data you observed this with?

Was it a tar archive?
Was it binary or text data?

Also thanks so much for your interest in this project!

MuditMaurya commented 3 years ago

I see, May be I will check that issue and try to fix it.

And yes, I was working with a tar archive.

(I think we should keep this issue open until the issue is resolved.) Thanks