Hi- I've been using pylzma to handle large(ish) 7z files ranging from 50MB-1.0GB compressed. I am trying to access individual files from the archive, one at a time, and I noticed that performance can be highly variable, and is very slow in comparison to ZipFile.
Below I compared performance for two archives containing the same files (I created the ZIP by extracting the 7z file and recompressing it with zip):
On the one hand, the ZIP file is almost 6x as large as the 7Z file; on the other hand, 7z access seems 10x-100x slower.
My question: is there a way for me to improve the performance of py7zlib? is there a better way to use the archive to reference single files? Or is there a technical limitation that prevents this?
n.b. the performance is no different if I keep the archive open between successive retrievals. It is consistent for the same file over multiple trials (some are fast, others are slow- in this case all the files are about the same size so that's not the issue).
Hi- I've been using pylzma to handle large(ish) 7z files ranging from 50MB-1.0GB compressed. I am trying to access individual files from the archive, one at a time, and I noticed that performance can be highly variable, and is very slow in comparison to ZipFile.
Below I compared performance for two archives containing the same files (I created the ZIP by extracting the 7z file and recompressing it with zip):
http://nbviewer.jupyter.org/github/bkuczenski/lca-tools/blob/master/doc/7z%20profiling.ipynb
On the one hand, the ZIP file is almost 6x as large as the 7Z file; on the other hand, 7z access seems 10x-100x slower.
My question: is there a way for me to improve the performance of
py7zlib
? is there a better way to use the archive to reference single files? Or is there a technical limitation that prevents this?n.b. the performance is no different if I keep the archive open between successive retrievals. It is consistent for the same file over multiple trials (some are fast, others are slow- in this case all the files are about the same size so that's not the issue).
Thanks for any feedback.