Magdoll / SQANTI2

SQANTI2 is now replaced by SQANTI3. Please go to: https://github.com/ConesaLab/SQANTI3
Other
38 stars 15 forks source link

Error in sqanti_qc2.py #35

Open psur9757 opened 5 years ago

psur9757 commented 5 years ago

Hello, I am not entirely sure why I am getting the error. Any help will be appreciated.

Sqanti.pbs

#!/bin/bash
#PBS -P BLRseq
#PBS -N sqanti
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=20:00:00
#PBS -e /scratch/BLRseq/BLR/results/annotation/structural/BLR560/isoseq/sqanti/sqanti.err
#PBS -o /scratch/BLRseq/BLR/results/annotation/structural/BLR560/isoseq/sqanti/sqanti.out

wdir=/scratch/BLRseq/BLR
outdir=$wdir/results/annotation/structural/BLR560/isoseq/sqanti
ref=$wdir/results/annotation/structural/BLR560/ref.fa
cds=$wdir/data/BLR560/isoseq/SMRT/cluster/hq_isoforms.fasta
gtf=$outdir/augustus.ab_initio.gtf

sqanti=/scratch/BLRseq/bin/SQANTI2
cupcake=/scratch/BLRseq/bin/cDNA_Cupcake
module load python
module load cufflinks/2.2.1
module load ucsc-userapps/348
module load R/3.3.2
module load minimap2/2.3
module load samtools/1.9
export PYTHONPATH=$PYTHONPATH:$cupcake/sequence

minimap2 -ax splice -uf $ref $cds > $outdir/aln.sam
samtools sort -O sam -o $outdir/aln.sorted.sam -@ 8 $outdir/aln.sam
python $cupcake/cupcake/tofu/collapse_isoforms_by_sam.py --input $cds -s $outdir/aln.sorted.sam -o $outdir/hq --dun-merge-5-shorter
python $sqanti/sqanti_qc2.py --aligner_choice minimap2 -t 8 --output BLR560 --dir $outdir $outdir/hq.collapsed.rep.fa $gtf $ref

$ tail -20 sqanti.err

[M::worker_pipeline::8.938*3.74] mapped 8843 sequences
[M::main] Version: 2.3-r545-dirty
[M::main] CMD: minimap2 -ax splice --secondary=no -C5 -uf -t 8 /scratch/BLRseq/BLR/results/annotation/structural/BLR560/ref.fa /scratch/BLRseq/BLR/results/annotation/structural/BLR560/isoseq/sqanti/hq.collapsed.rep.renamed.fasta
[M::main] Real time: 8.965 sec; CPU: 33.463 sec
output written to /scratch/BLRseq/BLR/results/annotation/structural/BLR560/isoseq/sqanti/hq.collapsed.rep.renamed_corrected.fasta
**** Parsing Isoforms....
Traceback (most recent call last):
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 1762, in <module>
    main()
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 1758, in main
    run(args)
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 1405, in run
    write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff')
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 399, in write_collapsed_GFF_with_CDS
    for r in reader:
  File "/home/psur9757/.local/lib/python2.7/site-packages/cupcake/io/GFF.py", line 393, in next
    return self.read()            
  File "/home/psur9757/.local/lib/python2.7/site-packages/cupcake/io/GFF.py", line 550, in read
    assert raw[2] == 'transcript'
AssertionError
huyuem commented 5 years ago

I got the same mistake, and I want to know why.

Magdoll commented 5 years ago

Can you please upgrade to Cupcake v8.6 and SQANTI2 v4.1 and let me know if error persists? I've made some GFF changes and this could be the issue. Apologies. -Liz

psur9757 commented 5 years ago

Hi Liz,

I installed the latest version of Cupcake using

git clone https://github.com/Magdoll/cDNA_Cupcake.git
cd cDNA_Cupcake/
python setup.py build
python setup.py install --user

but I get the error,

<open file '<stderr>', mode 'w' at 0x2b2c214f81e0> Cupcake version must be 8.6 or higher! Got 8.1 instead.

Am I doing something wrong?

Best Regards, Priyanka

Magdoll commented 5 years ago

Hi @psur9757 ,

You need to update Cupcake first to v8.6 or above. This is intended to let you know that! -Liz

psur9757 commented 5 years ago

Hi @Magdoll

I did update Cupcake. I re-installed Cupcake a few hours ago. I did git clone when that version gave me the error:

<open file '<stderr>', mode 'w' at 0x2b2c214f81e0> Cupcake version must be 8.6 or higher! Got 8.1 instead.

I downloaded https://github.com/Magdoll/cDNA_Cupcake/archive/v8.7.0.tar.gz and tried again but I got the same error.

psur9757 commented 5 years ago

Hi @Magdoll

Cupcake works fine for me now. The test runs successfully and everything. But I still get error with SQANTI2.

The python and Conda versions are good.

% conda -V
conda 4.7.12
% python -V
Python 3.7.4

cDNA_Cupcake test command runs successfully. I have version 9.0.1 installed correctly. collapse_isoforms_by_sam.py --input cDNA_Cupcake/cupcake/test_data/hq_isoforms.fastq --fq -s cDNA_Cupcake/cupcake/test_data/hq_isoforms.fastq.sorted.sam --dun-merge-5-shorter -o test

I have SQANTI2 version 4.1 correctly and the cDNA_Cupcake is version 9.0.1 but I still get error.

% python SQANTI2-4.1/sqanti_qc2.py 
  File "SQANTI2-4.1/sqanti_qc2.py", line 68
    print sys.stderr, "Cupcake version must be 8.6 or higher! Got {0} instead.".format(cupcake.__version__)
            ^
SyntaxError: invalid syntax

This computer has no other version of either software installed so this cannot be leftover from previous attempt. Any help to get SQANTI2 running will be much appreciated, we have a deadline end of the month to meet.

Magdoll commented 5 years ago

Hi @psur9757 ,

Apologies on the SQANTI2 part - I did not check in the Python 3.7 version of SQANTI2 until just now. PLease head to SQANTI2 and check out the latest version (5.0.0).

Let me know if you encounter more issues and I will try to respond promptly so you can meet your deadline.

--Liz

psur9757 commented 5 years ago

Hi @Magdoll

I updated to the latest version and here is the last few lines of the stderr file. Let me know what to modify to fix this.

Cleaning up isoform IDs...
Cleaned up isoform fasta file written to: /scratch/BLRseq/BLR/results/annotation/structural/BLR560/sqanti5/hq.collapsed.rep.renamed.fasta
output written to 
**** Parsing Isoforms....
Traceback (most recent call last):
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 1794, in <module>
    main()
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 1790, in main
    run(args)
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 1437, in run
    write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff')
  File "/scratch/BLRseq/bin/SQANTI2/sqanti_qc2.py", line 400, in write_collapsed_GFF_with_CDS
    for r in reader:
  File "/home/psur9757/.local/lib/python3.7/site-packages/cupcake-9.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 395, in __next__
    return self.read()            
  File "/home/psur9757/.local/lib/python3.7/site-packages/cupcake-9.0.1-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 552, in read
    assert raw[2] == 'transcript'
AssertionError


PBS script

#!/bin/bash
#PBS -P BLRseq
#PBS -N sqanti
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=20:00:00
#PBS -e /scratch/BLRseq/BLR/results/annotation/structural/BLR560/sqanti5/sqanti.err
#PBS -o /scratch/BLRseq/BLR/results/annotation/structural/BLR560/sqanti5/sqanti.out

wdir=/scratch/BLRseq/BLR
outdir=$wdir/results/annotation/structural/BLR560/sqanti5
ref=$wdir/results/annotation/structural/BLR560/ref.fa
isoforms=$wdir/data/BLR560/isoseq/SMRT/cluster/hq_isoforms.fasta
ann=$wdir/results/annotation/structural/BLR560/.old/isoseq/braker/augustus.ab_initio.gff3

sqanti=/scratch/BLRseq/bin/SQANTI2
cupcake=/scratch/BLRseq/bin/cDNA_Cupcake
module load python/3.7.2
module load cufflinks/2.2.1
module load ucsc-userapps/348
module load R/3.6.0
module load minimap2/2.3
module load samtools/1.9
export PYTHONPATH=$PYTHONPATH:$cupcake/sequence/

minimap2 -ax splice -uf $ref $isoforms > $outdir/aln.sam
samtools sort -O sam -o $outdir/aln.sorted.sam -@ 8 $outdir/aln.sam
gffread -E -O -T $ann -o $outdir/augustus.ab_initio.gtf
python3.7 $cupcake/cupcake/tofu/collapse_isoforms_by_sam.py --input $isoforms -s $outdir/aln.sorted.sam -o $outdir/hq --dun-merge-5-shorter
python3.7 $sqanti/sqanti_qc2.py -t 8 --aligner_choice minimap2 --output BLR560 --dir $outdir $outdir/hq.collapsed.rep.fa $outdir/augustus.ab_initio.gtf $ref
AminMahpour commented 4 years ago

Hi @Magdoll

I am experiencing the same issue with SQANTI2 with complete new installation of SQANTI and Cupcake.

Traceback (most recent call last):
  File "/home/userA/bin/SQANTI2/sqanti_qc2.py", line 2198, in <module>
    main()
  File "/home/userA/bin/SQANTI2/sqanti_qc2.py", line 2191, in main
    run(args)
  File "/home/userA/bin/SQANTI2/sqanti_qc2.py", line 1676, in run
    write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff')
  File "/home/userA/bin/SQANTI2/sqanti_qc2.py", line 422, in write_collapsed_GFF_with_CDS
    for r in reader:
  File "/home/userA/anaconda31/envs/anaCogent3/lib/python3.7/site-packages/cupcake-11.0.0-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 408, in __next__
    return self.read()
  File "/home/userA/anaconda31/envs/anaCogent3/lib/python3.7/site-packages/cupcake-11.0.0-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 579, in read
    assert raw[2] == 'transcript'
AssertionError

raw[2] can be "exon" too. It seems it fails to assert raw[2]=="exon" entries.

Any idea how I can fix this issue? Thanks!

Magdoll commented 4 years ago

@AminMahpour , This error is most seen when your reference annotation or input GTF format is unexpected. Can you post the first few entries from your annotation GTF and input GTF (if your input is GTF)

--Liz

AminMahpour commented 4 years ago

Thanks Liz. My input is a fastq file (minimap2 aligner). Genomic annotation GTF is from Gencode v32. I am attaching a few entries from the SQANTI2-corrected GTF file. As can be seen, "Transcript" lines are missing. When I manually fix the GTF file to add the missing lines, SQUANTI works fine.

Thanks! Amin

chr1    GRCh38  exon    167121  168165  .       -       .       transcript_id "PB.1.1"; gene_id "PB.1.1"; gene_name "PB.1.1";
chr1    GRCh38  exon    169049  169264  .       -       .       transcript_id "PB.1.1"; gene_id "PB.1.1"; gene_name "PB.1.1";
chr1    GRCh38  exon    172557  172688  .       -       .       transcript_id "PB.1.1"; gene_id "PB.1.1"; gene_name "PB.1.1";
chr1    GRCh38  exon    173753  173862  .       -       .       transcript_id "PB.1.1"; gene_id "PB.1.1"; gene_name "PB.1.1";
chr1    GRCh38  exon    181037  181128  .       -       .       transcript_id "PB.1.1"; gene_id "PB.1.1"; gene_name "PB.1.1";
chr1    GRCh38  exon    184877  185350  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    185491  185559  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    186317  186469  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    187129  187287  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    187380  187577  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    187755  187890  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    188130  188266  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    188439  188584  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    188791  188889  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    195263  195416  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    199837  199885  .       -       .       transcript_id "PB.2.1"; gene_id "PB.2.1"; gene_name "PB.2.1";
chr1    GRCh38  exon    184924  185350  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    185491  185559  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    186317  186469  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    187129  187287  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    187376  187577  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    187755  187890  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    188130  188266  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    188439  188584  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    188791  188902  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    195263  195416  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    199837  199862  .       -       .       transcript_id "PB.2.2"; gene_id "PB.2.2"; gene_name "PB.2.2";
chr1    GRCh38  exon    629640  629774  .       +       .       transcript_id "PB.7657.24"; gene_id "PB.7657.24"; gene_name "PB.7657.24";
B10inform commented 4 years ago

I am getting similar error with example file:

sqanti_qc2.py example/test_mini.fasta gencode.v28.annotation.gtf hg38.fa

Error i get: Number of classified isoforms: 11 Traceback (most recent call last): File "/home/SQANTI2/sqanti_qc2.py", line 2198, in main() File "/home/SQANTI2/sqanti_qc2.py", line 2191, in main run(args) File "/home/SQANTI2/sqanti_qc2.py", line 1676, in run write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff') File "/home/spthapa/software/SQUANTI2/SQANTI2/sqanti_qc2.py", line 422, in write_collapsed_GFF_with_CDS for r in reader: File "/home/.conda/envs/anaCogent5.2/lib/python3.6/site-packages/cupcake-12.0.0-py3.6-linux-x86_64.egg/cupcake/io/GFF.py", line 408, in next return self.read() File "/home/.conda/envs/anaCogent5.2/lib/python3.6/site-packages/cupcake-12.0.0-py3.6-linux-x86_64.egg/cupcake/io/GFF.py", line 579, in read assert raw[2] == 'transcript' AssertionError

Any help would be great. Thanks