comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
262 stars 62 forks source link

IndexError while using generateEvents #177

Closed ArnavBharti closed 10 months ago

ArnavBharti commented 11 months ago

I am getting an error while running generateEvents

INFO:eventGenerator:Reading input data.
ERROR:__main__:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f87de7dd820>)

The command ran is

suppa.py generateEvents -i ../genomic.gtf -o AlternativeSplicing/localAS/local -f ioe -e {SE,SS,MX,RI,FL}

I don't think there is any error with the gtf file input. What may be the possible reasons and fixes for this error?

EduEyras commented 11 months ago

Hi Arnav,

Does it produce any output? How did you build or obtain the GTF gile?

Thanks

Eduardo

On Wed, 27 Dec 2023 at 01:38, Arnav Bharti @.***> wrote:

I am getting an error while running generateEvents

INFO:eventGenerator:Reading input data. ERROR:main:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f87de7dd820>)

The command ran is

suppa.py generateEvents -i ../genomic.gtf -o AlternativeSplicing/localAS/local -f ioe -e {SE,SS,MX,RI,FL}

I don't think there is any error with the gtf file input. What may be the possible reasons and fixes for this error?

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/177, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB5AAWE5X5Z25D4MRBTYLLONRAVCNFSM6AAAAABBDLB5V6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TMNBWHAZTSMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ArnavBharti commented 11 months ago

It does not produce any output. The program is stopped by this error midway. The GTF file was obtained from NCBI database.

EduEyras commented 11 months ago

Could you please send me a snippet of the GTF file? I am not familiar with the GTF produced by NCBI SUPPA will expect a 9th column with at least two fields: gene_id and transcript_id, which give the hierarchy of genes containing multiple transcripts E.

On Thu, 28 Dec 2023 at 14:46, Arnav Bharti @.***> wrote:

It does not produce any output. The program is stopped by this error midway. The GTF file was obtained from NCBI database.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/177#issuecomment-1870795716, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2HFOHHHRVBGTGWUH3YLTTRBAVCNFSM6AAAAABBDLB5V6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQG44TKNZRGY . You are receiving this because you commented.Message ID: @.***>

ArnavBharti commented 11 months ago

This is the format of GTF file I got from NCBI. It has only 8 columns.

Actually not all lines are giving error. To debug the error I took a snippet of first 1300 lines or so and it gave the ioe file. But somewhere ahead it gave the IndexError.

Also, I wanted to know that while the readme mentions GTF file could we input GFF as well?

CM000442.1  Genbank exon    756799  757686  .   +   .   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; locus_tag "PVX_093675"; note "transcript PVX_093675A"; partial "true"; product "von Willebrand factor A-domain-related protein, putative"; transcript_biotype "mRNA"; exon_number "1"; 
CM000442.1  Genbank CDS 756799  757683  .   +   0   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; gbkey "CDS"; locus_tag "PVX_093675"; note "encoded by transcript PVX_093675A"; product "von Willebrand factor A-domain-related protein, putative"; protein_id "EDL42943.1"; exon_number "1"; 
CM000442.1  Genbank start_codon 756799  756801  .   +   0   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; gbkey "CDS"; locus_tag "PVX_093675"; note "encoded by transcript PVX_093675A"; product "von Willebrand factor A-domain-related protein, putative"; protein_id "EDL42943.1"; exon_number "1"; 
CM000442.1  Genbank stop_codon  757684  757686  .   +   0   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; gbkey "CDS"; locus_tag "PVX_093675"; note "encoded by transcript PVX_093675A"; product "von Willebrand factor A-domain-related protein, putative"; protein_id "EDL42943.1"; exon_number "1"; 
CM000442.1  Genbank gene    760883  760948  .   +   .   gene_id "PVX_093677"; transcript_id ""; gbkey "Gene"; gene_biotype "rRNA"; locus_tag "PVX_093677"; 
CM000442.1  Genbank transcript  760883  760948  .   +   .   gene_id "PVX_093677"; transcript_id "unassigned_transcript_164"; gbkey "rRNA"; locus_tag "PVX_093677"; note "LSU 5.8S rRNA; truncated"; product "5.8S ribosomal RNA"; transcript_biotype "rRNA"; 
CM000442.1  Genbank exon    760883  760948  .   +   .   gene_id "PVX_093677"; transcript_id "unassigned_transcript_164"; locus_tag "PVX_093677"; note "LSU 5.8S rRNA; truncated"; product "5.8S ribosomal RNA"; transcript_biotype "rRNA"; exon_number "1";