Closed HegedusB closed 4 years ago
Hi @HegedusB
This looks like a GFF format problem. Can you show some examples of the GFF input?
--Liz
Hi Liz, Thanks for the quick response! These are the first three entries from the GFF file.
"""
scaffold_1 pinfish mRNA 7715 8160 6 + . gene_id "4f707bf7-2f2d-44c8-9531-f5164b574890"; transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; scaffold_1 pinfish exon 7715 7823 6 + . transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; scaffold_1 pinfish exon 7880 8160 6 + . transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; scaffold_1 pinfish mRNA 11145 14063 115 + . gene_id "351e447b-f1ef-4085-9fbb-79f286d80b3a"; transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 11145 11919 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12020 12108 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12165 12207 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12284 12295 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12385 12907 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12956 13011 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13064 13202 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13260 13567 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13650 13844 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13906 14063 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish mRNA 14451 18205 9 - . gene_id "4c13da85-f237-4077-b94a-3f34ec520853"; transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 14451 15326 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 15385 15880 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 15937 16815 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 16882 17451 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 17519 18054 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 18113 18205 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; """
Botond
Hi Liz, I have struggled a lot to find out what is the problem with my gff files. I tried to clean them as much as possible, I tried to use the cDNA_cupcake collapse_isoforms_by_sam.py output gff file as a template without any result. It seems like the sqanti2 works fine if I am using the example files (collapsed (collapse_isoforms_by_sam) fastq files form the cDNA_cupcake and the gencode.v29.annotation.gtf with GRCh38.p12.genome.fa) but gives an error if I am trying to use something else. If you don’t mind I attach some examples from my gff files. Maybe you can figure out what is problem because at this point it is a mystery for me. Best regards, Botond
The error message is:
**** Running SQANTI2...
Traceback (most recent call last):
File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2139, in
Hi @HegedusB ,
What your GFF looks like when read:
In [14]: h.readline()
Out[14]: 'scaffold_1 genePredFile exon 7715 7823 . + . gene_id "G1"; transcript_id "G1.1"; exon_number "1"; exon_id "G1.1.1"\n'
What it should look like:
In [15]: f.readline()
Out[15]: 'scaffold_1\tgenePredFile\texon\t7715\t7823\t.\t+\t.\tgene_id "G1"; transcript_id "G1.1";\n'
PB.X.Y
format. I've put in a fix at the dev
branch in Cupcake that you can check out. Use the dev branch for now and I'll integrate this back to master soon.-Liz
Hi Liz, Thanks for the answer! The missing tabs is just an annoying mistake what I made. I just copied the lines from the console. I will try to correct the gene_id and I will check the dev branch for sure. Thanks again! Botond
Hi Liz, After I am following the expected PB.X.Y format everything goes fine. Thanks for your help. Botond
Wonderful! Issue closed.
Hi Magdoll, I try to use sqanti2 to analyze my ONT data. For gtf input I am using the pinfish output which contains transcript isoforms produced by merging clustered overlapping nanopore reads. When I try to run the script I get this message:
""" R scripting front-end version 3.6.2 (2019-12-12) Write arguments to /SSD2/bhegedus/pj8_3genseq/Transcript_sequencing/Copci_ONT/all_good_pass_reads/desalt_out/sqanti2_out/copci_desalt_pinfish_test1.params.txt... **** Running SQANTI2... Traceback (most recent call last): File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2139, in
main()
File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2134, in main
split_dirs = split_input_run(args)
File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1914, in split_input_run
recs = [r for r in collapseGFFReader(args.isoforms)]
File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1914, in
recs = [r for r in collapseGFFReader(args.isoforms)]
File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 405, in next
return self.read()
File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 562, in read
assert raw[2] == 'transcript'
AssertionError
"""
I do not have any clues what can go wrong. Program versions: SQANTI2 7.3.2
Python 3.7.6 Thanks for any help!