Assertion Error - Githubissues

HegedusB commented 4 years ago

Hi Magdoll, I try to use sqanti2 to analyze my ONT data. For gtf input I am using the pinfish output which contains transcript isoforms produced by merging clustered overlapping nanopore reads. When I try to run the script I get this message:

""" R scripting front-end version 3.6.2 (2019-12-12) Write arguments to /SSD2/bhegedus/pj8_3genseq/Transcript_sequencing/Copci_ONT/all_good_pass_reads/desalt_out/sqanti2_out/copci_desalt_pinfish_test1.params.txt... **** Running SQANTI2... Traceback (most recent call last): File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2139, in main() File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2134, in main split_dirs = split_input_run(args) File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1914, in split_input_run recs = [r for r in collapseGFFReader(args.isoforms)] File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1914, in recs = [r for r in collapseGFFReader(args.isoforms)] File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 405, in next return self.read() File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 562, in read assert raw[2] == 'transcript' AssertionError """

I do not have any clues what can go wrong. Program versions: SQANTI2 7.3.2
Python 3.7.6 Thanks for any help!

Magdoll commented 4 years ago

Hi @HegedusB

This looks like a GFF format problem. Can you show some examples of the GFF input?

--Liz

HegedusB commented 4 years ago

Hi Liz, Thanks for the quick response! These are the first three entries from the GFF file.

"""

gff-version 2

scaffold_1 pinfish mRNA scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish mRNA scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish mRNA scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon scaffold_1 pinfish exon """ 7715 8160 6 + . gene_id "4f707bf7-2f2d-44c8-9531-f5164b574890"; transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; 7715 7823 6 + . transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; 7880 8160 6 + . transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; 11145 14063 115 + . gene_id "351e447b-f1ef-4085-9fbb-79f286d80b3a"; transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 11145 11919 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 12020 12108 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 12165 12207 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 12284 12295 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 12385 12907 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 12956 13011 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 13064 13202 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 13260 13567 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 13650 13844 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 13906 14063 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; 14451 18205 9 - . gene_id "4c13da85-f237-4077-b94a-3f34ec520853"; transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; 14451 15326 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; 15385 15880 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; 15937 16815 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; 16882 17451 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; 17519 18054 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; 18113 18205 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114";

Botond

HegedusB commented 4 years ago

Hi Liz, I have struggled a lot to find out what is the problem with my gff files. I tried to clean them as much as possible, I tried to use the cDNA_cupcake collapse_isoforms_by_sam.py output gff file as a template without any result. It seems like the sqanti2 works fine if I am using the example files (collapsed (collapse_isoforms_by_sam) fastq files form the cDNA_cupcake and the gencode.v29.annotation.gtf with GRCh38.p12.genome.fa) but gives an error if I am trying to use something else. If you don’t mind I attach some examples from my gff files. Maybe you can figure out what is problem because at this point it is a mystery for me. Best regards, Botond

error_message.zip

The error message is: **** Running SQANTI2... Traceback (most recent call last): File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2139, in main() File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2134, in main split_dirs = split_input_run(args) File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1926, in split_input_run write_collapseGFF_format(f, recs[j]) File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 530, in write_collapseGFF_format f.write("{chr}\tPacBio\ttranscript\t{s}\t{e}\t.\t{strand}\t.\tgene_id \"{gid}\"; transcript_id \"{tid}\";\n".format(chr=r.chr, s=r.start+1, e=r.end, strand=r.strand,gid=r.geneid, tid=r.seqid)) File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 363, in getattr raise AttributeError(key) AttributeError: geneid

Magdoll commented 4 years ago

Hi @HegedusB ,

Your GFF format is all wrong. GFF3 formats are supposed to be tab-delimited. All your blanks are in spaces.

What your GFF looks like when read:

In [14]: h.readline()                                                                                                                               
Out[14]: 'scaffold_1      genePredFile    exon    7715    7823    .       +       .       gene_id "G1"; transcript_id "G1.1"; exon_number "1"; exon_id "G1.1.1"\n'

What it should look like:

In [15]: f.readline()                                                                                                                               
Out[15]: 'scaffold_1\tgenePredFile\texon\t7715\t7823\t.\t+\t.\tgene_id "G1"; transcript_id "G1.1";\n'

Even with the spaces fixed back to tab, you will still run into issue with not being able to parse the gene id correctly because it's not the expected PB.X.Y format. I've put in a fix at the dev branch in Cupcake that you can check out. Use the dev branch for now and I'll integrate this back to master soon.

-Liz

HegedusB commented 4 years ago

Hi Liz, Thanks for the answer! The missing tabs is just an annoying mistake what I made. I just copied the lines from the console. I will try to correct the gene_id and I will check the dev branch for sure. Thanks again! Botond

HegedusB commented 4 years ago

Hi Liz, After I am following the expected PB.X.Y format everything goes fine. Thanks for your help. Botond

Magdoll commented 4 years ago

Wonderful! Issue closed.

Magdoll / SQANTI2

Assertion Error #54

gff-version 2