devsnd / tarindexer

python module for indexing tar files for fast access
GNU General Public License v3.0
74 stars 15 forks source link

Issue making index #7

Closed relh closed 1 year ago

relh commented 1 year ago
segment_semantic__taskonomy__stockwell.tar
Traceback (most recent call last):
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 187, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: '?'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 2336, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 1122, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 1064, in frombuf
    chksum = nti(buf[148:156])
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 189, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs/turbo/fouheyTemp/relh/latent-diffusion/ldm/data/tarindexer.py", line 129, in <module>
    indextar(dbtarfile,indexfile)
  File "/nfs/turbo/fouheyTemp/relh/latent-diffusion/ldm/data/tarindexer.py", line 55, in indextar
    with tarfile.open(dbtarfile, 'r|') as db:
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 1650, in open
    t = cls(name, filemode, stream, **kwargs)
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 1531, in __init__
    self.firstmember = self.next()
  File "/nfs/turbo/fouheyTemp/relh/mambaforge/lib/python3.9/tarfile.py", line 2348, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

Ran into this.. looks like the tarfile might be bad?

devsnd commented 1 year ago

Yeah, looks like the tar-file is corrupted. The error happens in python's built-in parser, so there's nothing I could do in this library to prevent the error.

Are you able to extract it using other tools? If so, please open a ticket with the cpython maintainer of the tarfile module.