Closed defendant602 closed 5 years ago
Hi @defendant602 ,
I've fixed a few ORF related bugs in the last versions. If the issue still exists for the latest v3.8, please let me know and can you share the input file with me in private?
--Liz
Hi Liz,
I have downloaded the version 3.8 and test it on my dataset which failed in version 3.5. Sorry I have to tell you the problem remains the same as KeyError in line 1278 of sqanti_qc2.py.
However, when I test other three datasets with version 3.8, all of them completed without an error. So this KeyError may be because of something I don't know in this particular input dataset.
Sorry I can't figure out a convenient way to share my input data with you, since filesize of the genome.fa is actually quite big (2Gb).
Hi @defendant602 ,
Interesting it may be an edge-case bug.
It is possible for you to share large files with me if you give me your email address. I will request a confidential upload.
--Liz
Thanks for the efforts you've made to make this software more perfect. This is my email address defendant602@gmail.com and I will share my input dataset with you.
Thanks again.
@defendant602 ,
Thank you. Request for file upload sent to your email.
--Liz
Hi @defendant602 ,
The issue came from the SQANTI2 default run of minimap2 had multiple mappings for a query PB.24244.8
, and none of them aligned the query entirely, causing issue with mapping the predicted ORFs back to the genomic location. You can tell by looking for PB.24244.8
in the file all.collapsed.rep.renamed_corrected.sam
.
One solution is to provide the input as a GTF with the --gtf
option. This skips the minimap2 run and ensures you have the full mapping (supposedly) for whatever aligner + parameter you used originally.
-Liz
Closing unless otherwise noted.
Hi, Liz:
Thanks for your reply on issue #23 , and it solved my problem. But I ran into another error when running version 3.5. My commands were :
python sqanti_qc2.py --aligner_choice minimap2 -t 20 --geneid all.collapsed.rep.fa genome.gtf genome.fa -o test
The error message was like output written to all.collapsed.rep.renamed_corrected.fasta Skipping PB.9452.1 because unmapped. **** Parsing Isoforms.... Traceback (most recent call last): File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1753, in
main()
File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1749, in main
run(args)
File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1392, in run
isoforms_info = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict)
File "/export/pipeline/RNASeq/Software/SQANTI2/v3.5/SQANTI2-master/sqanti_qc2.py", line 1278, in isoformClassification
orfDict[rec.id].cds_genomic_end = m[orfDict[rec.id].cds_end-1] + 1 # make it 1-based
KeyError: 946
But when I changed the input isoforms from fasta format to gtf format, it completed successfully. The commands were:
python sqanti_qc2.py -g --geneid all.collapsed.gff genome.gtf genome.fa -o test
I wonder whether it was because of the code or the input isoform fasta file? Could you give any advices?
Thanks.