Magdoll / SQANTI2

SQANTI2 is now replaced by SQANTI3. Please go to: https://github.com/ConesaLab/SQANTI3
Other
38 stars 15 forks source link

KeyError in sqanti_qc2.py version 3.5 #24

Closed defendant602 closed 5 years ago

defendant602 commented 5 years ago

Hi, Liz:

Thanks for your reply on issue #23 , and it solved my problem. But I ran into another error when running version 3.5. My commands were :

python sqanti_qc2.py --aligner_choice minimap2 -t 20 --geneid all.collapsed.rep.fa genome.gtf genome.fa -o test

The error message was like output written to all.collapsed.rep.renamed_corrected.fasta Skipping PB.9452.1 because unmapped. **** Parsing Isoforms.... Traceback (most recent call last): File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1753, in main() File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1749, in main run(args) File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1392, in run isoforms_info = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict) File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1278, in isoformClassification orfDict[rec.id].cds_genomic_end = m[orfDict[rec.id].cds_end-1] + 1 # make it 1-based KeyError: 946

But when I changed the input isoforms from fasta format to gtf format, it completed successfully. The commands were:

python sqanti_qc2.py -g --geneid all.collapsed.gff genome.gtf genome.fa -o test

I wonder whether it was because of the code or the input isoform fasta file? Could you give any advices?

Thanks.

Magdoll commented 5 years ago

Hi @defendant602 ,

I've fixed a few ORF related bugs in the last versions. If the issue still exists for the latest v3.8, please let me know and can you share the input file with me in private?

--Liz

defendant602 commented 5 years ago

Hi Liz,

I have downloaded the version 3.8 and test it on my dataset which failed in version 3.5. Sorry I have to tell you the problem remains the same as KeyError in line 1278 of sqanti_qc2.py.

However, when I test other three datasets with version 3.8, all of them completed without an error. So this KeyError may be because of something I don't know in this particular input dataset.

Sorry I can't figure out a convenient way to share my input data with you, since filesize of the genome.fa is actually quite big (2Gb).

Magdoll commented 5 years ago

Hi @defendant602 ,

Interesting it may be an edge-case bug.

It is possible for you to share large files with me if you give me your email address. I will request a confidential upload.

--Liz

defendant602 commented 5 years ago

Thanks for the efforts you've made to make this software more perfect. This is my email address defendant602@gmail.com and I will share my input dataset with you.

Thanks again.

Magdoll commented 5 years ago

@defendant602 ,

Thank you. Request for file upload sent to your email.

--Liz

Magdoll commented 5 years ago

Hi @defendant602 ,

The issue came from the SQANTI2 default run of minimap2 had multiple mappings for a query PB.24244.8, and none of them aligned the query entirely, causing issue with mapping the predicted ORFs back to the genomic location. You can tell by looking for PB.24244.8 in the file all.collapsed.rep.renamed_corrected.sam.

One solution is to provide the input as a GTF with the --gtf option. This skips the minimap2 run and ensures you have the full mapping (supposedly) for whatever aligner + parameter you used originally.

-Liz

Magdoll commented 5 years ago

Closing unless otherwise noted.