Magdoll / SQANTI2

SQANTI2 is now replaced by SQANTI3. Please go to: https://github.com/ConesaLab/SQANTI3
Other
38 stars 15 forks source link

SQANTI2 fails at ORF finding #42

Closed wolfgangrumpf closed 4 years ago

wolfgangrumpf commented 5 years ago

Posting the original issue here first....

I've mapped some isoseq data and then collapsed it with TAMA and am now trying to do the final processing in SQANTI2 - I can't get past this error:

(/opt/sqanti2/4.1/anaCogent3) [rwr002@node35 ISOSEQ_gmap_tama]$ python /opt/sqanti2/4.1/SQANTI2-4.1/sqanti_qc2.py --aligner_choice gmap -x genomes/GRCh38 -t 18 -o FAM43 -d TESTOUT fam43_isoseq_out.fastq.split.tama.renamed.fasta genomes/gencode.v32.annotation.gtf genomes/hg38.fa

R scripting front-end version 3.2.3 (2015-12-10)

Cleaning up isoform IDs...

Cleaned up isoform fasta file written to: /gpfs0/home/gdlauberlab/rwr002/TESTING/ISOSEQ_gmap_tama/fam43_isoseq_out.fastq.split.tama.renamed.renamed.fasta

**** Running SQANTI...

**** Parsing provided files....

Reading genome fasta /gpfs0/home/gdlauberlab/rwr002/TESTING/ISOSEQ_gmap_tama/genomes/hg38.fa....

Aligned SAM /gpfs0/home/gdlauberlab/rwr002/TESTING/ISOSEQ_gmap_tama/TESTOUT/fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.sam already exists. Using it...

output written to /gpfs0/home/gdlauberlab/rwr002/TESTING/ISOSEQ_gmap_tama/TESTOUT/fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.fasta

**** Predicting ORF sequences...

terminate called after throwing an instance of 'std::length_error'

  what():  basic_string::_S_create

GeneMarkS: error on last system call, error code 134

Abort program!!!

Traceback (most recent call last):

  File "/opt/sqanti2/4.1/SQANTI2-4.1/sqanti_qc2.py", line 1793, in <module>

    main()

  File "/opt/sqanti2/4.1/SQANTI2-4.1/sqanti_qc2.py", line 1789, in main

    run(args)

  File "/opt/sqanti2/4.1/SQANTI2-4.1/sqanti_qc2.py", line 1414, in run

    orfDict = correctionPlusORFpred(args, genome_dict)

  File "/opt/sqanti2/4.1/SQANTI2-4.1/sqanti_qc2.py", line 555, in correctionPlusORFpred

    if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:

  File "/opt/sqanti2/4.1/anaCogent3/lib/python2.7/subprocess.py", line 190, in check_call

    raise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command 'perl /gpfs0/export/opt/sqanti2/4.1/SQANTI2-4.1/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /gpfs0/home/gdlauberlab/rwr002/TESTING/ISOSEQ_gmap_tama/TESTOUT/GMST/GMST_tmp /gpfs0/home/gdlauberlab/rwr002/TESTING/ISOSEQ_gmap_tama/TESTOUT/fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.fasta' returned non-zero exit status 1
wolfgangrumpf commented 5 years ago

I check the files and here's the result:

fam43_isoseq_out.fastq.split.tama.renamed.renamed.fasta looks fine; it has a series of fasta sequences along the lines of:

>PB.1.35
AAAAATCAGATGGGGACTTTATTGTGATGGTGGCAGGTCCACCAGCA etc. etc. etc.

These files are empty: fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.fasta fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.gtf fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.sam

fam43_isoseq_out.fastq.split.tama.renamed.renamed_corrected.gtf.tmp only has the header line.

If the .sam file is empty doesn't that suggest that mapping failed? I didn't see anything in the console to suggest that.....

mysecretbackyard commented 4 years ago

Did you manage to solve the problem I have the same issue?

zrpk commented 4 years ago

Same for me.

Magdoll commented 4 years ago

@wolfgangrumpf , @mysecretbackyard , @kgiperez ,

The issue comes from GMAP and I probably have some bug in there. The reason "*corrected.fasta" is empty is an indication of SQANTI2 failing to run GMAP to get an aligned SAM and convert it to a genome-corrected fasta.

I hate to do this - but can you guys switch to minimap2 for the time being?

I'll put on my ToDo to get GMAP fixed again.

Alternatively, supply directly the GMAP output in GFF3/GTF format, by running with --gtf input.gtf instead of input.fasta. When SQANTI2 sees --gtf it will not run the aligner step and directly go to ORF prediction.

--Liz

zrpk commented 4 years ago

I'm actually using minimap2. Tried with or without --aligner_choice=minimap2 but I still get the same error.

Magdoll commented 4 years ago

If you supply GTF would it works? Is this human? Would you be willing to share the input fasta confidentially for testing ?

On Mon, Nov 18, 2019 at 6:28 PM Kevin Perez notifications@github.com wrote:

I'm actually using minimap2. Tried with or without --aligner_choice=minimap2 but I still get the same error.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Magdoll/SQANTI2/issues/42?email_source=notifications&email_token=AAEQE3ZERCF52FT3Q7PDSZLQUNFMXA5CNFSM4JBBEGXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEMUB2Q#issuecomment-555303146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQE37U2SS6G7X6ILXSCXTQUNFMXANCNFSM4JBBEGXA .

-- Sent from Gmail Mobile. Excuse any possible typos.

Magdoll commented 4 years ago

Hi,

I just tested GMAP and I'm unable to produce your errors.

The command I used is below. If any of you could share confidentially your input fasta or gtf for me to reproduce your errors I'd appreciate it. I'd just need your email (which you can email me at etseng@pacb.com)

python ~/GitHub/SQANTI2/sqanti_qc2.py \
       -o TEST -d test_out \
       -t 30 --aligner_choice gmap \
       -x ~/share/gmap_db_new/hg38  \
        test.fasta \
        gencode.v31.chr_patch_hapl_scaff.annotation.gtf \
        hg38.fa 
mysecretbackyard commented 4 years ago

Hi,

@Magdoll unfortunately, my input is confidential patient data so I can not share it. I used --aligner_choice=minimap2 and input fa file from cupcake.collapsed.rep.fa\ Homo_sapiens.GRCh38.97.gtf\ Homo_sapiens.GRCh38.dna.primary_assembly.fa I got the same error.

Magdoll commented 4 years ago

Hi @mysecretbackyard ,

Can you please run the test data and see if it works? I'd like to discern whether the issue is your input or the software.

There is a test.gtf and test_mini.fasta:

python sqanti_qc2.py --gtf test.gtf \
   gencode.v29.annotation.gtf hg38.fa

python sqanti_qc2.py test_mini.fasta \
   gencode.v29.annotation.gtf hg38.fa
mysecretbackyard commented 4 years ago

Hi @Magdoll ,

Thank you for your response. I tried both test_mini.fasta and my input without --aligner_choice and it works!

Magdoll commented 4 years ago

Hi @mysecretbackyard ,

Good to hear.

Closing this issue unless others continue to experience this problem. -Liz