Error during Isoform Classification

aarzalluz commented 4 years ago

Hi Liz,

I have been trying to run SQANTI2, but have run into an error and I'm having a hard time identifying what exactly could be causing it. This is the log out I get from the squanti_qc2.py run:

R scripting front-end version 3.4.3 (2017-11-30) Cleaning up isoform IDs... Cleaned up isoform fasta file written to: cupcake_processing/isoforms.polished.hq.collapsed.rep.renamed.fasta Write arguments to isoforms.collapsed.sqanti2qc.params.txt... **** Running SQANTI... **** Parsing provided files.... Reading genome fasta data/hg38/GRCh38.p13_refseq_genomic.fna.... Error corrected FASTA isoforms.polished.hq.collapsed.rep.renamed_corrected.fasta already exists. Using it... **** Predicting ORF sequences... ORF file isoforms.polished.hq.collapsed.rep.renamed_corrected.faa already exists. Using it.... **** Parsing Reference Transcriptome.... refAnnotation_isoforms.collapsed.sqanti2qc.genePred already exists. Using it. **** Parsing Isoforms.... Splice Junction Coverage files not provided. **** Performing Classification of Isoforms.... Traceback (most recent call last): File "SQANTI2/sqanti_qc2.py", line 1996, in <module> main() File "SQANTI2/sqanti_qc2.py", line 1991, in main run(args) File "SQANTI2/sqanti_qc2.py", line 1613, in run isoforms_info = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict) File "SQANTI2/sqanti_qc2.py", line 1459, in isoformClassification orfDict[rec.id].cds_genomic_end = m[orfDict[rec.id].cds_end-1] + 1 # make it 1-based KeyError: 1321

As you can see, everything works fine until the Isoform Classification step. Do you have any idea whether it could be a bug, or something related to the particular formatting that the ORF prediction outputs using my data? I'm saying this because it seems to be some sort of error in the creation of the dictionary storing the ORF information...

Thanks!

Ángeles PhD Student, Conesa Lab

Magdoll commented 4 years ago

Hi @aarzalluz , This error is coming from trying to parse the ORF information produced by GENEMARK. It could be an edge case that I did not handle well (or maybe some issue with the ORF fasta file).

Is it possible to share this data confidentially for debugging? -Liz

aarzalluz commented 4 years ago

Sure, I can share the input fasta and FL counts as well as the reference genome/annotation version I'm using -do you think that would be enough for you to replicate the error?

Ángeles

Magdoll commented 4 years ago

Hi @aarzalluz yes that will work. Please coordinate with me through email at etseng@pacb.com. I can give you a secure place to upload data.

aarzalluz commented 4 years ago

Hi, Liz

I realized I had made an error when mapping -sorry about that! SQANTI2 is working now regarding ORF prediction. I'll close the issue.

Magdoll / SQANTI2

Error during Isoform Classification #51