chfoo / warcat

Tool and library for handling Web ARChive (WARC) files.
GNU General Public License v3.0
147 stars 21 forks source link

Handling for "files" that are purely in memory? #16

Open spott opened 7 years ago

spott commented 7 years ago

More accurately, how am I supposed to handle a "file" that is really just a bunch of bytes?

Ideally, I would like to use a BinaryIO object, however, these don't have a name attribute, so I get this error:

  File "/usr/local/lib/python3.5/site-packages/warcat/model/block.py", line 83, in load
    binary_block.set_file(file_obj.name or file_obj, file_obj.tell(), length)
AttributeError: '_io.BytesIO' object has no attribute 'name'

I'm not sure how to get around this.

chfoo commented 7 years ago

Oops, another case that wasn't tested. As a workaround, maybe try something like your_file_object.name = None?

spott commented 7 years ago

This actually isn't enough...

You use file_obj.peek() in the code, and BinaryIO objects don't have that function.

Interestingly, my workaround is to use GzipFile, which does have a peek function and takes BinaryIO objects.

If you are interested, accepting the BufferedIOBase interface (in the io library) should allow you to take most file-like objects.