Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception

fancycode / pylzma

Python bindings for the LZMA library

http://www.joachim-bauch.de/projects/pylzma/

GNU Lesser General Public License v2.1

234 stars 72 forks source link

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception #32

Open ijacquez opened 9 years ago

ijacquez commented 9 years ago

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception:

for name in self.archive.getnames():
    out_filename = os.path.join(path, name)
    out_dir = os.path.dirname(out_filename)
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
        with open(out_filename, 'wb') as out_file:
            out_file.write(self.archive.getmember(name).read())

victor3rc commented 8 years ago

I managed to read 7z files in chunks.

@fancycode if you have any interest in this let me know, I can wrap it up in a method and do a pull request. It could potentially solve this issue.

ijacquez commented 8 years ago

@victor3rc, what were the results with files exceeding 4GiB?

fancycode commented 8 years ago

@victor3rc sure, pull requests are always welcome!

remyroy commented 8 years ago

@victor3rc I'm highly interesting by that code which read 7z files in chunks. It makes little sense for the ArchiveFile class to have a single read method which reads the whole file in memory.

victor3rc commented 8 years ago

@remyroy I agree.

@ijacquez I've been doing some tests with a 50+ GB file and it is reading it in chunks fine.

I'll try to wrap it in a method this week guys.

victor3rc commented 8 years ago

Hey @remyroy @ijacquez, just an update: I managed to read chunks but I was getting some errors when I was calling pylzma.decompressobj.decompress(chunk), specifically at the end of the file, on the final chunks.

A temporary solution I have found to the problem is to use subprocess to call 7z and decompress the file locally. I then read whatever is decompressed in chunks.

ijacquez commented 8 years ago

Do you have an idea as to what is causing that? Is it your changes? Are the chunks too big?

victor3rc commented 8 years ago

no idea, sorry. I didn't have time to look into the pylzma.decompressobj.decompress functionality, that's where the error was happening. It wouldn't be the size of the chunks, that method is used to read the entire file.

igkins commented 5 years ago

@victor3rc , could you post your chunk reading code? even if it didn't fully work?