Closed fanchyna closed 7 years ago
I'm having the same issue I think. This is a WARC file that was built using the Internet Archive's warc library.
[jeff warc]$ warcat split my.warc.gz
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.5/site-packages/warcat/__main__.py", line 154, in <module>
main()
File "/usr/lib/python3.5/site-packages/warcat/__main__.py", line 70, in main
command_info[1](args)
File "/usr/lib/python3.5/site-packages/warcat/__main__.py", line 126, in split_command
tool.process()
File "/usr/lib/python3.5/site-packages/warcat/tool.py", line 95, in process
check_block_length=self.check_block_length)
File "/usr/lib/python3.5/site-packages/warcat/model/warc.py", line 75, in read_record
check_block_length=check_block_length)
File "/usr/lib/python3.5/site-packages/warcat/model/record.py", line 68, in load
content_type)
File "/usr/lib/python3.5/site-packages/warcat/model/block.py", line 21, in load
field_cls=HTTPHeader)
File "/usr/lib/python3.5/site-packages/warcat/model/block.py", line 92, in load
fields = field_cls.parse(file_obj.read(field_length).decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 712: invalid start byte
I've installed warcat on my server under Python 3.4. The warc.load() command to a warc file gives me the following error message: