daler / gffutils

GFF and GTF file manipulation and interconversion
http://daler.github.io/gffutils
MIT License
287 stars 78 forks source link

create_db sqlite3.InterfaceError (small genome) #167

Closed soungalo closed 2 years ago

soungalo commented 3 years ago

Hello, I am using python 3.8.6 and gffutils 0.10.1. When loading a specific gff3 file, I get the error message: sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.. I've seen in past issues that this used to happen with very large chromosomes, but I understand that this has been fixed, and also my genome doesn't have any large chromosomes. Interestingly, I have another very similar gff file, for which everything works fine. Here's what I did:

>>> gff_db1 = gffutils.create_db('gff_1.gff3', 'tmp1.sqlite3', force=True, merge_strategy="create_unique", verbose=True)
2021-01-04 10:12:18,843 - INFO - Populating features
2021-01-04 10:13:05,791 - INFO - Populating features table and first-order relations: 402249 features
2021-01-04 10:13:05,791 - INFO - Updating relations
2021-01-04 10:13:20,408 - INFO - Creating relations(parent) index
2021-01-04 10:13:22,960 - INFO - Creating relations(child) index
2021-01-04 10:13:24,699 - INFO - Creating features(featuretype) index
2021-01-04 10:13:26,875 - INFO - Creating features (seqid, start, end) index
2021-01-04 10:13:27,167 - INFO - Creating features (seqid, start, end, strand) index
2021-01-04 10:13:27,481 - INFO - Running ANALYZE features

>>> gff_db2 = gffutils.create_db('gff_2.gff3', 'tmp2.sqlite3', force=True, merge_strategy="create_unique", verbose=True)
2021-01-04 10:14:03,027 - INFO - Populating features
Traceback (most recent call last):t-order relations: 398000 features
  File "<stdin>", line 1, in <module>
  File "/davidb/liorglic/Projects/Panoramic/output/A_thaliana_pan_genome/map_to_pan/.snakemake/conda/029ba149/lib/python3.8/site-packages/gffutils/create.py", line 1292, in create_db
    c.create()
  File "/davidb/liorglic/Projects/Panoramic/output/A_thaliana_pan_genome/map_to_pan/.snakemake/conda/029ba149/lib/python3.8/site-packages/gffutils/create.py", line 507, in create
    self._populate_from_lines(self.iterator)
  File "/davidb/liorglic/Projects/Panoramic/output/A_thaliana_pan_genome/map_to_pan/.snakemake/conda/029ba149/lib/python3.8/site-packages/gffutils/create.py", line 589, in _populate_from_lines
    self._insert(f, c)
  File "/davidb/liorglic/Projects/Panoramic/output/A_thaliana_pan_genome/map_to_pan/.snakemake/conda/029ba149/lib/python3.8/site-packages/gffutils/create.py", line 530, in _insert
    cursor.execute(constants._INSERT, feature.astuple())
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.

And here are the two gff files: gff_1.gff3.gz gff_2.gff3.gz

Any idea what's wrong with gff_2.gff3?

Thanks!

soungalo commented 3 years ago

Found the bad record. It's this one, strating at line 398592:

An-1__chr2_12990458-12991147    EVM     gene    0       463     .       +       .       ID=ATAN-2G49800;Note=protein_coding_gene
An-1__chr2_12990458-12991147    EVM     mRNA    0       463     .       +       .       ID=ATAN-2G49800.1;Parent=ATAN-2G49800
An-1__chr2_12990458-12991147    EVM     CDS     0       272     .       +       0       ID=ATAN-2G49800.1.cds1;Parent=ATAN-2G49800.1
An-1__chr2_12990458-12991147    EVM     CDS     356     463     .       +       0       ID=ATAN-2G49800.1.cds2;Parent=ATAN-2G49800.1
An-1__chr2_12990458-12991147    EVM     exon    0       272     .       +       .       ID=ATAN-2G49800.1.exon1;Parent=ATAN-2G49800.1
An-1__chr2_12990458-12991147    EVM     exon    356     463     .       +       .       ID=ATAN-2G49800.1.exon2;Parent=ATAN-2G49800.1

For some reason, it has a 0 start coordinate, which is illegal in gff. Since I wrote the script that created this gff - totally my bad. Still, it'd be nice to have a more meaningful error message pointing to the problematic record.

daler commented 2 years ago

I think this is addressed in v0.11, but added a regression test in https://github.com/daler/gffutils/pull/191 to ensure this doesn't show up again in the future.

daler commented 2 years ago

Now addressed in v0.11.