comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
244 stars 60 forks source link

IndexError while using generateEvents #177

Closed ArnavBharti closed 5 months ago

ArnavBharti commented 5 months ago

I am getting an error while running generateEvents

INFO:eventGenerator:Reading input data.
ERROR:__main__:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f87de7dd820>)

The command ran is

suppa.py generateEvents -i ../genomic.gtf -o AlternativeSplicing/localAS/local -f ioe -e {SE,SS,MX,RI,FL}

I don't think there is any error with the gtf file input. What may be the possible reasons and fixes for this error?

EduEyras commented 5 months ago

Hi Arnav,

Does it produce any output? How did you build or obtain the GTF gile?

Thanks

Eduardo

On Wed, 27 Dec 2023 at 01:38, Arnav Bharti @.***> wrote:

I am getting an error while running generateEvents

INFO:eventGenerator:Reading input data. ERROR:main:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f87de7dd820>)

The command ran is

suppa.py generateEvents -i ../genomic.gtf -o AlternativeSplicing/localAS/local -f ioe -e {SE,SS,MX,RI,FL}

I don't think there is any error with the gtf file input. What may be the possible reasons and fixes for this error?

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/177, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB5AAWE5X5Z25D4MRBTYLLONRAVCNFSM6AAAAABBDLB5V6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TMNBWHAZTSMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ArnavBharti commented 5 months ago

It does not produce any output. The program is stopped by this error midway. The GTF file was obtained from NCBI database.

EduEyras commented 5 months ago

Could you please send me a snippet of the GTF file? I am not familiar with the GTF produced by NCBI SUPPA will expect a 9th column with at least two fields: gene_id and transcript_id, which give the hierarchy of genes containing multiple transcripts E.

On Thu, 28 Dec 2023 at 14:46, Arnav Bharti @.***> wrote:

It does not produce any output. The program is stopped by this error midway. The GTF file was obtained from NCBI database.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/177#issuecomment-1870795716, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2HFOHHHRVBGTGWUH3YLTTRBAVCNFSM6AAAAABBDLB5V6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZQG44TKNZRGY . You are receiving this because you commented.Message ID: @.***>

ArnavBharti commented 5 months ago

This is the format of GTF file I got from NCBI. It has only 8 columns.

Actually not all lines are giving error. To debug the error I took a snippet of first 1300 lines or so and it gave the ioe file. But somewhere ahead it gave the IndexError.

Also, I wanted to know that while the readme mentions GTF file could we input GFF as well?

CM000442.1  Genbank exon    756799  757686  .   +   .   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; locus_tag "PVX_093675"; note "transcript PVX_093675A"; partial "true"; product "von Willebrand factor A-domain-related protein, putative"; transcript_biotype "mRNA"; exon_number "1"; 
CM000442.1  Genbank CDS 756799  757683  .   +   0   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; gbkey "CDS"; locus_tag "PVX_093675"; note "encoded by transcript PVX_093675A"; product "von Willebrand factor A-domain-related protein, putative"; protein_id "EDL42943.1"; exon_number "1"; 
CM000442.1  Genbank start_codon 756799  756801  .   +   0   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; gbkey "CDS"; locus_tag "PVX_093675"; note "encoded by transcript PVX_093675A"; product "von Willebrand factor A-domain-related protein, putative"; protein_id "EDL42943.1"; exon_number "1"; 
CM000442.1  Genbank stop_codon  757684  757686  .   +   0   gene_id "PVX_093675"; transcript_id "unassigned_transcript_163"; gbkey "CDS"; locus_tag "PVX_093675"; note "encoded by transcript PVX_093675A"; product "von Willebrand factor A-domain-related protein, putative"; protein_id "EDL42943.1"; exon_number "1"; 
CM000442.1  Genbank gene    760883  760948  .   +   .   gene_id "PVX_093677"; transcript_id ""; gbkey "Gene"; gene_biotype "rRNA"; locus_tag "PVX_093677"; 
CM000442.1  Genbank transcript  760883  760948  .   +   .   gene_id "PVX_093677"; transcript_id "unassigned_transcript_164"; gbkey "rRNA"; locus_tag "PVX_093677"; note "LSU 5.8S rRNA; truncated"; product "5.8S ribosomal RNA"; transcript_biotype "rRNA"; 
CM000442.1  Genbank exon    760883  760948  .   +   .   gene_id "PVX_093677"; transcript_id "unassigned_transcript_164"; locus_tag "PVX_093677"; note "LSU 5.8S rRNA; truncated"; product "5.8S ribosomal RNA"; transcript_biotype "rRNA"; exon_number "1";