alard / warc-proxy

Serving content from a WARC
60 stars 4 forks source link

AttributeError: 'WarcIndexer' object has no attribute 'records' #8

Open martinvahi opened 7 years ago

martinvahi commented 7 years ago
warc_librarian@acstorage3334:/media/pi/Sinine230GiBUSB/warc_librarian $ Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "./warcproxy.py", line 112, in run
    http_response = parse_http_response(record)
  File "./warcproxy.py", line 24, in parse_http_response
    remainder = message.feed(record.content[1])
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 576, in feed
    text = HTTPMessage.feed(self, text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 97, in feed
    text = self.feed_headers(text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 191, in feed_headers
    line, text = self.feed_line(text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 159, in feed_line
    text = str(self.buffer[pos:])
MemoryError

ERROR:tornado.application:Uncaught exception POST /load-warc (::1)
HTTPRequest(protocol='http', host='warc', method='POST', uri='/load-warc', version='HTTP/1.1', remote_ip='::1', headers={'Origin': 'http://warc', 'Content-Length': '102', 'Accept-Language': 'en-us;q=0.750', 'Accept-Encoding': 'gzip, deflate', 'Host': 'warc', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'User-Agent': 'Mozilla/5.0 (X11; Linux) AppleWebKit/538.15 (KHTML, like Gecko) Chrome/18.0.1025.133 Safari/538.15 Midori/0.5', 'Connection': 'Keep-Alive', 'X-Requested-With': 'XMLHttpRequest', 'Referer': 'http://warc/static/list.html', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1346, in _when_complete
    callback()
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1367, in _execute_method
    self._when_complete(method(*self.path_args, **self.path_kwargs),
  File "./warcproxy.py", line 344, in post
    index_status = self.warc_proxy.load_warc_file(path)
  File "./warcproxy.py", line 142, in load_warc_file
    self.indices[path] = indexer.records
AttributeError: 'WarcIndexer' object has no attribute 'records'
ERROR:tornado.access:500 POST /load-warc (::1) 30.11ms

The ~560MiB sized WARC-file that probably was used, when this happened, MIGHT be available from http://temporary.softf1.com/2017/bugs/www.clausewitz.com-2017-02-09-8df72096-00000.warc.gz It might have happened with some other WARC-file, I'm not totally sure, but the referenced one also fails to load for what ever reason.