Closed manuelsmendoza closed 2 years ago
I'm also getting a KeyError at the same point running the same options (--force_id_ignore and --fasta) with SQANTI3 v.4.2.
Hi!
That's a weird error... Would it be possible that you obtained some coding exons of 1bp long after mapping your reads with SQANTI3? For checking this, please, do:
awk '{if ($4==$5) print $0}' <obtained_corrected.gtf>
If you get some exons with the same start and end position, check if they are coding with a grep
of the transcript ID on the also obtained *_corrected.faa
. If that's the case, that is what is causing the problem. You might want to try a different mapper...
Besides that, I realized that you are using *polished.hq.fasta
. Even though these reads are polished, they are still reads, not transcript models. In order to collapse your reads into transcript-models, I suggest you to use cDNA Cupcake or TAMA, and then, use SQANTI3.
Fran
Hi Francisco, thanks for the reply.
I had the same error so am replying here (but let me know if I should open a new Issue for this). You are correct that the there are coding exons with the same start and end position.
I tried to use ulTRA instead of minimap2 and got the error:
****Aligning reads with uLTRA...
creating /home/lab/Desktop/Sid//SQANTI3_test/SQANTI_out/uLTRA_out/
/home/lab/miniconda2/envs/SQANTI3.env/lib/python3.9/site-packages/gffutils/create.py:724: UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts=True` option to speed up database creation
warnings.warn(
Traceback (most recent call last):
File "/home/lab/miniconda2/envs/SQANTI3.env/bin/uLTRA", line 722, in <module>
prep_splicing(args, refs_lengths)
File "/home/lab/miniconda2/envs/SQANTI3.env/bin/uLTRA", line 63, in prep_splicing
db = gffutils.create_db(fn, dbfn=database, force=True, keep_order=True, merge_strategy='merge',
File "/home/lab/miniconda2/envs/SQANTI3.env/lib/python3.9/site-packages/gffutils/create.py", line 1292, in create_db
c.create()
File "/home/lab/miniconda2/envs/SQANTI3.env/lib/python3.9/site-packages/gffutils/create.py", line 508, in create
self._update_relations()
File "/home/lab/miniconda2/envs/SQANTI3.env/lib/python3.9/site-packages/gffutils/create.py", line 950, in _update_relations
gene_bin = bins.bins(gene_start, gene_end, one=True)
File "/home/lab/miniconda2/envs/SQANTI3.env/lib/python3.9/site-packages/gffutils/bins.py", line 79, in bins
if start >= MAX_CHROM_SIZE or stop >= MAX_CHROM_SIZE:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
File "/home/lab/Desktop/Sid/SQANTI3_test/../../apps/SQANTI3-4.2/sqanti3_qc.py", line 2509, in <module>
main()
File "/home/lab/Desktop/Sid/SQANTI3_test/../../apps/SQANTI3-4.2/sqanti3_qc.py", line 2492, in main
run(args)
File "/home/lab/Desktop/Sid/SQANTI3_test/../../apps/SQANTI3-4.2/sqanti3_qc.py", line 1884, in run
orfDict = correctionPlusORFpred(args, genome_dict)
File "/home/lab/Desktop/Sid//SQANTI3_test/../../apps/SQANTI3-4.2/sqanti3_qc.py", line 505, in correctionPlusORFpred
if subprocess.check_call(cmd, shell=True)!=0:
File "/home/lab/miniconda2/envs/SQANTI3.env/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'uLTRA pipeline /home/lab/Desktop/Sid//SQANTI3_test/test.final_rename_mod.fasta /home/lab/Desktop/Sid/SQANTI3_test/test.all.maker.final.homologIDs.gtf /home/lab/Desktop/Sid//SQANTI3_test/CLR_hq_transcripts.renamed.fasta /home/lab/Desktop/Sid/SQANTI3_test/SQANTI_out/uLTRA_out/ --t 10 --prefix ../genome_corrected --isoseq' returned non-zero exit status 1.'
I also tried gmap and deSALT, but got a different error. I just changed --aligner_choce (it works with minimap2). (It seems like this is just an error with the expected position of the argument?)
Traceback (most recent call last):
File "/home/castoelab/Desktop/Sid/testin/SQANTI3_test/../../apps/SQANTI3-4.2/sqanti3_qc.py", line 2509, in <module>
main()
File "/home/castoelab/Desktop/Sid/testin/SQANTI3_test/../../apps/SQANTI3-4.2/sqanti3_qc.py", line 2440, in main
if not os.path.isdir(os.path.abspath(args.gmap_index)):
File "/home/castoelab/miniconda2/envs/SQANTI3.env/lib/python3.9/posixpath.py", line 375, in abspath
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType
If I removed the offending lines in the gtf and reran the code, do you think that would solve the issue instead of trying a different aligner?
Thanks very much for your help in making this code better!
I'm also getting a KeyError at the same point running the same options (--force_id_ignore and --fasta) with SQANTI3 v.4.2.
I also got the same error using the same options with version 4.2.
Hi!
That's a weird error... Would it be possible that you obtained some coding exons of 1bp long after mapping your reads with SQANTI3? For checking this, please, do:
awk '{if ($4==$5) print $0}' <obtained_corrected.gtf>
If you get some exons with the same start and end position, check if they are coding with a
grep
of the transcript ID on the also obtained*_corrected.faa
. If that's the case, that is what is causing the problem. You might want to try a different mapper...Besides that, I realized that you are using
*polished.hq.fasta
. Even though these reads are polished, they are still reads, not transcript models. In order to collapse your reads into transcript-models, I suggest you to use cDNA Cupcake or TAMA, and then, use SQANTI3.Fran
My input is a FASTA of a transcriptome assembly. I confirm that my _corrected.gtf
also has the 1-bp exon issue mentioned previously.
I think the issue stems from cDNA Cupcake not generating the coordinates correctly from the CIGAR string (instead of an alignment issue in minimap2). For example, there are also some invalid coordinates from my log:
Error: invalid feature coordinates (end<start!) at line:
5 Homo_sapiens exon 88620775 88620774 . - . ID=rb_166714.exon2;Name=rb_166714.exon2;Parent=rb_166714
which doesn't make any sense at all.
Hi @kmnip -we just released version 4.3. Could you update SQANTI3 and see whether the error persists?
Hi Angeles, I am still getting the same error with version 4.3:
**** Performing Classification of Isoforms....
Traceback (most recent call last):
File "/projects/kmnip_prj/tools/SQANTI3-4.3/sqanti3_qc.py", line 2461, in <module>
main()
File "/projects/kmnip_prj/tools/SQANTI3-4.3/sqanti3_qc.py", line 2444, in main
run(args)
File "/projects/kmnip_prj/tools/SQANTI3-4.3/sqanti3_qc.py", line 1854, in run
isoforms_info, ratio_TSS_dict = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict, corrGTF)
File "/projects/kmnip_prj/tools/SQANTI3-4.3/sqanti3_qc.py", line 1686, in isoformClassification
isoform_hit.CDS_genomic_start = m[isoform_hit.CDS_start-1] + 1 # make it 1-based
KeyError: 6389
Ok -we currently don't have any internal handling of this sort of error implemented. However, the anomaly does seem to be generated by the collapse step (cupcake, in your case).
My suggestion would be to remove the offending lines from your GTF and/or to look for potential pitfalls on your upstream processing -perhaps you could make sure that you have run the cluster/polish step correctly or try TAMA for collapse instead, like @FJPardoPalacios suggested. However, if you do wish to stick to cupcake, you could open an issue in the cDNA_cupcake repo and see if they can help out!
Since this is not a SQANTI3 solve per se, I will close the issue now -feel free to open a new one if you run into any other problems using SQ3.
Hi,
I'm trying to use sqanti3 to classify the transcripts sequenced from a non-model marine organism. I've clustered the pacbio reads using the isoseq3 pipeline described in the github repo of pacbio.
I only have the genome (
GCA_905397895.1_MEDL1_genomic.fna
) of the organisms and its annotation (GCA_905397895.1_MEDL1_genomic.gtf
) as previous information and the header of my pb reads are different default so i used the--force_id_ignore
option and--fasta
.I run it as follow:
I started running ok but after a time it crash reporting the following error: