fancycode / pylzma

Python bindings for the LZMA library
http://www.joachim-bauch.de/projects/pylzma/
GNU Lesser General Public License v2.1
234 stars 72 forks source link

Slow performance #39

Open bkuczenski opened 8 years ago

bkuczenski commented 8 years ago

Hi- I've been using pylzma to handle large(ish) 7z files ranging from 50MB-1.0GB compressed. I am trying to access individual files from the archive, one at a time, and I noticed that performance can be highly variable, and is very slow in comparison to ZipFile.

Below I compared performance for two archives containing the same files (I created the ZIP by extracting the 7z file and recompressing it with zip):

http://nbviewer.jupyter.org/github/bkuczenski/lca-tools/blob/master/doc/7z%20profiling.ipynb

On the one hand, the ZIP file is almost 6x as large as the 7Z file; on the other hand, 7z access seems 10x-100x slower.

My question: is there a way for me to improve the performance of py7zlib? is there a better way to use the archive to reference single files? Or is there a technical limitation that prevents this?

n.b. the performance is no different if I keep the archive open between successive retrievals. It is consistent for the same file over multiple trials (some are fast, others are slow- in this case all the files are about the same size so that's not the issue).

Thanks for any feedback.

bkuczenski commented 8 years ago

This turns out to be due to high memory requirements