Magdoll / SQANTI2

SQANTI2 is now replaced by SQANTI3. Please go to: https://github.com/ConesaLab/SQANTI3
Other
38 stars 15 forks source link

Assertion Error #54

Closed HegedusB closed 4 years ago

HegedusB commented 4 years ago

Hi Magdoll, I try to use sqanti2 to analyze my ONT data. For gtf input I am using the pinfish output which contains transcript isoforms produced by merging clustered overlapping nanopore reads. When I try to run the script I get this message:

""" R scripting front-end version 3.6.2 (2019-12-12) Write arguments to /SSD2/bhegedus/pj8_3genseq/Transcript_sequencing/Copci_ONT/all_good_pass_reads/desalt_out/sqanti2_out/copci_desalt_pinfish_test1.params.txt... **** Running SQANTI2... Traceback (most recent call last): File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2139, in main() File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2134, in main split_dirs = split_input_run(args) File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1914, in split_input_run recs = [r for r in collapseGFFReader(args.isoforms)] File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1914, in recs = [r for r in collapseGFFReader(args.isoforms)] File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 405, in next return self.read() File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 562, in read assert raw[2] == 'transcript' AssertionError """

I do not have any clues what can go wrong. Program versions: SQANTI2 7.3.2
Python 3.7.6 Thanks for any help!

Magdoll commented 4 years ago

Hi @HegedusB

This looks like a GFF format problem. Can you show some examples of the GFF input?

--Liz

HegedusB commented 4 years ago

Hi Liz, Thanks for the quick response! These are the first three entries from the GFF file.

"""

gff-version 2

scaffold_1 pinfish mRNA 7715 8160 6 + . gene_id "4f707bf7-2f2d-44c8-9531-f5164b574890"; transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; scaffold_1 pinfish exon 7715 7823 6 + . transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; scaffold_1 pinfish exon 7880 8160 6 + . transcript_id "b10101fa-be46-4f77-aef6-a65b612cbe44"; scaffold_1 pinfish mRNA 11145 14063 115 + . gene_id "351e447b-f1ef-4085-9fbb-79f286d80b3a"; transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 11145 11919 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12020 12108 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12165 12207 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12284 12295 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12385 12907 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 12956 13011 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13064 13202 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13260 13567 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13650 13844 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish exon 13906 14063 115 + . transcript_id "6322d307-6ca1-4d49-ab86-064b6f74287f"; scaffold_1 pinfish mRNA 14451 18205 9 - . gene_id "4c13da85-f237-4077-b94a-3f34ec520853"; transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 14451 15326 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 15385 15880 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 15937 16815 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 16882 17451 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 17519 18054 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; scaffold_1 pinfish exon 18113 18205 9 - . transcript_id "c1211fac-9a03-49af-b232-312e494c1114"; """

Botond

HegedusB commented 4 years ago

Hi Liz, I have struggled a lot to find out what is the problem with my gff files. I tried to clean them as much as possible, I tried to use the cDNA_cupcake collapse_isoforms_by_sam.py output gff file as a template without any result. It seems like the sqanti2 works fine if I am using the example files (collapsed (collapse_isoforms_by_sam) fastq files form the cDNA_cupcake and the gencode.v29.annotation.gtf with GRCh38.p12.genome.fa) but gives an error if I am trying to use something else. If you don’t mind I attach some examples from my gff files. Maybe you can figure out what is problem because at this point it is a mystery for me. Best regards, Botond

error_message.zip

The error message is: **** Running SQANTI2... Traceback (most recent call last): File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2139, in main() File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 2134, in main split_dirs = split_input_run(args) File "/installs/additional_bins/SQANTI2/sqanti_qc2.py", line 1926, in split_input_run write_collapseGFF_format(f, recs[j]) File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 530, in write_collapseGFF_format f.write("{chr}\tPacBio\ttranscript\t{s}\t{e}\t.\t{strand}\t.\tgene_id \"{gid}\"; transcript_id \"{tid}\";\n".format(chr=r.chr, s=r.start+1, e=r.end, strand=r.strand,gid=r.geneid, tid=r.seqid)) File "/installs/miniconda3/envs/anaCogent3/lib/python3.7/site-packages/cupcake-10.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 363, in getattr raise AttributeError(key) AttributeError: geneid

Magdoll commented 4 years ago

Hi @HegedusB ,

  1. Your GFF format is all wrong. GFF3 formats are supposed to be tab-delimited. All your blanks are in spaces.

What your GFF looks like when read:

In [14]: h.readline()                                                                                                                               
Out[14]: 'scaffold_1      genePredFile    exon    7715    7823    .       +       .       gene_id "G1"; transcript_id "G1.1"; exon_number "1"; exon_id "G1.1.1"\n'

What it should look like:

In [15]: f.readline()                                                                                                                               
Out[15]: 'scaffold_1\tgenePredFile\texon\t7715\t7823\t.\t+\t.\tgene_id "G1"; transcript_id "G1.1";\n'
  1. Even with the spaces fixed back to tab, you will still run into issue with not being able to parse the gene id correctly because it's not the expected PB.X.Y format. I've put in a fix at the dev branch in Cupcake that you can check out. Use the dev branch for now and I'll integrate this back to master soon.

-Liz

HegedusB commented 4 years ago

Hi Liz, Thanks for the answer! The missing tabs is just an annoying mistake what I made. I just copied the lines from the console. I will try to correct the gene_id and I will check the dev branch for sure. Thanks again! Botond

HegedusB commented 4 years ago

Hi Liz, After I am following the expected PB.X.Y format everything goes fine. Thanks for your help. Botond

Magdoll commented 4 years ago

Wonderful! Issue closed.