NCBI gtf sometimes has an empty transcript_id field, which causes an IndexError when creating a db because gffutils sees that there is a transcript_id attribute present and then tries to use it depsite it being blank. This commit adds a second check to make sure that it is not blank before accessing it to avoid this error.
Stacktrace of error in gffutils caused by this line:
File "/storage/hpc/group/warrenlab/users/esrbhb/mambaforge/envs/bio/lib/python3.9/site-packages/gffutils/create.py", line 1292, in create_db
c.create()
File "/storage/hpc/group/warrenlab/users/esrbhb/mambaforge/envs/bio/lib/python3.9/site-packages/gffutils/create.py", line 507, in create
self._populate_from_lines(self.iterator)
File "/storage/hpc/group/warrenlab/users/esrbhb/mambaforge/envs/bio/lib/python3.9/site-packages/gffutils/create.py", line 788, in _populate_from_lines
parent = f.attributes[self.transcript_key][0]
IndexError: list index out of range
NCBI gtf sometimes has an empty transcript_id field, which causes an IndexError when creating a db because gffutils sees that there is a transcript_id attribute present and then tries to use it depsite it being blank. This commit adds a second check to make sure that it is not blank before accessing it to avoid this error.
Example of bad line in NCBI gtf:
Stacktrace of error in gffutils caused by this line: