alekseyzimin / EviAnn_release

This is the standalone version of the EviAnn pipeline
GNU General Public License v3.0
15 stars 1 forks source link

Failed run: Error: cannot open input file genome.fasta.masked.u.cds.gff! #4

Closed KatharinaHoff closed 8 months ago

KatharinaHoff commented 8 months ago

Dear Aleksey & Team,

I wanted to test EviAnn on Mus musculus. I used Mus caroli data as reference input. My job died. Here are the details:

prepare fasta files:

gffread -W -y mus_caroli_prot.faa -w mus_caroli_transc.fa -g ../tx_ncbi/GCF_900094665.1_CAROLI_EIJ_v1.1_genomic.fna ../tx_ncbi/GCF_900094665.1_CAROLI_EIJ_v1.1_genomic.gff

run eviann.sh:

eviann.sh -t 72 -g ../data/genome.fasta.masked -e ${PWD}/mus_caroli_transc.fa -r ${PWD}/mus_caroli_prot.faa

The input files exist:

ls -alht ../data/genome.fasta.masked
-rw-r--r-- 1 foo users 2.6G Mar 21  2023 ../data/genome.fasta.masked

ls -alht ${PWD}/mus_caroli_transc.fa
-rw-r--r-- 1 foo users 190M Dec 18 12:43 /home/nas-hs/users/katharina/galba/Mus_musculus/eviann/mus_caroli_transc.fa

ls -alht ${PWD}/mus_caroli_prot.faa
-rw-r--r-- 1 foo users 40M Dec 18 12:43 /home/nas-hs/users/katharina/galba/Mus_musculus/eviann/mus_caroli_prot.faa

Here is the STDERR output:

[samopen] SAM header is present: 21 sequences.
ls: cannot access 'tblastn.mus_caroli_prot.faa.*.batch.out': No such file or directory
[bam_header_read] EOF marker is absent. The input is probably truncated.
[samopen] SAM header is present: 21 sequences.
[samopen] SAM header is present: 21 sequences.
/home/foo/bin/EviAnn-1.0.7/bin/eviann.sh: line 346: genome.fasta.masked.mus_caroli_prot.faa.palign.gff: No such file or directory
  0 reference transcripts loaded.
  51504 query transfrags loaded.
mv: cannot stat 'genome.fasta.masked.protref.annotated.gtf': No such file or directory
Error: cannot open input file genome.fasta.masked.u.cds.gff!
mv: cannot stat 'genome.fasta.masked.k.gff': No such file or directory

Here is STDOUT:

Checking for ufasta on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/ufasta
Checking for hisat2 on the PATH... /home/foo/bin/hisat2/hisat2
Checking for minimap2 on the PATH... /home/foo/bin/minimap2/minimap2
Checking for stringtie on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/stringtie
Checking for gffread on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/gffread
Checking for blastp on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/blastp
Checking for tblastn on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/tblastn
Checking for makeblastdb on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/makeblastdb
Checking for gffcompare on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/gffcompare
Checking for TransDecoder.Predict on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/TransDecoder.Predict
Checking for TransDecoder.LongOrfs on the PATH... /home/foo/bin/EviAnn-1.0.7/bin/TransDecoder.LongOrfs
[Mon Dec 18 12:48:15 PM UTC 2023] Unpacking Uniprot database
[Mon Dec 18 12:48:21 PM UTC 2023] Building HISAT2 index
[Mon Dec 18 01:38:39 PM UTC 2023] Aligning RNAseq reads
[Mon Dec 18 01:40:15 PM UTC 2023] Aligning proteins
[Mon Dec 18 01:40:16 PM UTC 2023] Aligning proteins to the genome
[Mon Dec 18 01:40:16 PM UTC 2023] Filtering protein alignment file
[Mon Dec 18 01:40:34 PM UTC 2023] Running exonerate on the filtered sequences
[Mon Dec 18 01:40:34 PM UTC 2023] Sorting alignment files
[Mon Dec 18 01:41:02 PM UTC 2023] Assembling transcripts from related species with Stringtie
[Mon Dec 18 01:41:08 PM UTC 2023] Deriving gene models from protein and transcript alignments

Here are all produced files:

-rw-r--r--  1 foo users  848 Dec 18 13:41 eviann.2430077.node374.err
-rw-r--r--  1 foo users   46 Dec 18 13:41 genome.fasta.masked.k.gff.tmp
-rw-r--r--  1 foo users    0 Dec 18 13:41 genome.fasta.masked.u.gff.tmp
-rw-r--r--  1 foo users    0 Dec 18 13:41 genome.fasta.masked.unused_proteins.gff.tmp
-rw-r--r--  1 foo users    0 Dec 18 13:41 combine.out
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:41 genome.fasta.masked.palign.all.gff
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:41 genome.fasta.masked.transcripts_to_keep.txt
-rw-r--r--  1 hoffk83 users  66M Dec 18 13:41 genome.fasta.masked.protref.combined.gtf
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:41 genome.fasta.masked.palign.fixed.gff
-rw-r--r--  1 hoffk83 users 1.6K Dec 18 13:41 eviann.2430077.node374.out
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:41 stringtie.success
-rw-r--r--  1 hoffk83 users  84M Dec 18 13:41 genome.fasta.masked.gtf
-rw-r--r--  1 hoffk83 users  76M Dec 18 13:41 tissue0.bam.sorted.bam.gtf
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:41 sort.success
-rw-r--r--  1 hoffk83 users  96M Dec 18 13:41 tissue0.bam.sorted.bam
-rw-r--r--  1 hoffk83 users 185M Dec 18 13:40 tissue0.filter
-rw-r--r--  1 hoffk83 users  694 Dec 18 13:40 tissue0.header
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:40 genome.fasta.masked.mus_caroli_prot.faa.palign.gff.tmp
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:40 protein2genome.protein_align.success
-rw-r--r--  1 hoffk83 users  27M Dec 18 13:40 mus_caroli_prot.faa.uniq.faa
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:40 align.success
-rw-r--r--  1 hoffk83 users  40M Dec 18 13:40 tissue0.bam
-rw-r--r--  1 hoffk83 users  767 Dec 18 13:40 tissue0.err
-rw-r--r--  1 hoffk83 users  329 Dec 18 13:38 hisat2.sh
-rw-r--r--  1 hoffk83 users    0 Dec 18 13:38 align-build.success
-rw-r--r--  1 hoffk83 users 6.3K Dec 18 13:38 hisat2-build.out
-rw-r--r--  1 hoffk83 users 1.1G Dec 18 13:38 genome.fasta.masked.hst.5.ht2
-rw-r--r--  1 hoffk83 users 644M Dec 18 13:38 genome.fasta.masked.hst.6.ht2
-rw-r--r--  1 hoffk83 users 847M Dec 18 13:34 genome.fasta.masked.hst.1.ht2
-rw-r--r--  1 hoffk83 users 632M Dec 18 13:34 genome.fasta.masked.hst.2.ht2
-rw-r--r--  1 hoffk83 users   12 Dec 18 12:48 genome.fasta.masked.hst.7.ht2
-rw-r--r--  1 hoffk83 users    8 Dec 18 12:48 genome.fasta.masked.hst.8.ht2
-rw-r--r--  1 hoffk83 users 2.4K Dec 18 12:48 genome.fasta.masked.hst.3.ht2
-rw-r--r--  1 hoffk83 users 632M Dec 18 12:48 genome.fasta.masked.hst.4.ht2
-rw-r--r--  1 hoffk83 users  536 Dec 18 12:48 uniprot.pjs
-rw-r--r--  1 hoffk83 users  375 Dec 18 12:48 makeblastdb.out
-rw-r--r--  1 hoffk83 users  16K Dec 18 12:48 uniprot.ptf
-rw-r--r--  1 hoffk83 users 1.2M Dec 18 12:48 uniprot.pto
-rw-r--r--  1 hoffk83 users 3.4M Dec 18 12:48 uniprot.pot
-rw-r--r--  1 hoffk83 users  20K Dec 18 12:48 uniprot.pdb
-rw-r--r--  1 hoffk83 users  54M Dec 18 12:48 uniprot.phr
-rw-r--r--  1 hoffk83 users 2.3M Dec 18 12:48 uniprot.pin
-rw-r--r--  1 hoffk83 users 108M Dec 18 12:48 uniprot.psq
-rw-r--r--  1 hoffk83 users 146M Dec 18 12:48 uniprot_sprot.nonred.85.fasta
-rw-r--r--  1 hoffk83 users  40M Dec 18 12:43 mus_caroli_prot.faa
-rw-r--r--  1 hoffk83 users 190M Dec 18 12:43 mus_caroli_transc.fa

Please let me know if you need further details.

alekseyzimin commented 8 months ago

Hi Katharina,

Thank you for testing EviAnn!!!

I will be happy to help resolve your crash. The problem is in the specification of the path to the genome sequence, it has to be absolute as well. I know it is a pain to specify all absolute paths, and I am releasing a new version today (1.0.8) to fix that and also introduce speed improvements.

Here is the link to release 1.0.8, it is set as pre-release for now. I have been testing it for the last seek on several genomes and all tests completed successfully. Please let me know if this release (or adding the full path to the genome file) works for you.

Best, Aleksey

On Mon, Dec 18, 2023 at 8:53 AM Katharina Hoff @.***> wrote:

Dear Aleksy & Team,

I wanted to test EviAnn on Mus musculus. I used Mus caroli data as reference input. My job died. Here are the details:

prepare fasta files:

gffread -W -y mus_caroli_prot.faa -w mus_caroli_transc.fa -g ../tx_ncbi/GCF_900094665.1_CAROLI_EIJ_v1.1_genomic.fna ../tx_ncbi/GCF_900094665.1_CAROLI_EIJ_v1.1_genomic.gff

run eviann.sh:

eviann.sh -t 72 -g ../data/genome.fasta.masked -e ${PWD}/mus_caroli_transc.fa -r ${PWD}/mus_caroli_prot.faa

The input files exist:

ls -alht ../data/genome.fasta.masked -rw-r--r-- 1 hoffk83 users 2.6G Mar 21 2023 ../data/genome.fasta.masked

ls -alht ${PWD}/mus_caroli_transc.fa -rw-r--r-- 1 hoffk83 users 190M Dec 18 12:43 /home/nas-hs/users/katharina/galba/Mus_musculus/eviann/mus_caroli_transc.fa

ls -alht ${PWD}/mus_caroli_prot.faa -rw-r--r-- 1 hoffk83 users 40M Dec 18 12:43 /home/nas-hs/users/katharina/galba/Mus_musculus/eviann/mus_caroli_prot.faa

Here is the STDERR output:

/var/spool/slurm/job2430077/slurm_script: line 21: cd: /home/hoffk83/data/galba/Mus_musculus/eviann: No such file or directory [samopen] SAM header is present: 21 sequences. ls: cannot access 'tblastn.mus_caroli_prot.faa.*.batch.out': No such file or directory [bam_header_read] EOF marker is absent. The input is probably truncated. [samopen] SAM header is present: 21 sequences. [samopen] SAM header is present: 21 sequences. /home/hoffk83/bin/EviAnn-1.0.7/bin/eviann.sh: line 346: genome.fasta.masked.mus_caroli_prot.faa.palign.gff: No such file or directory 0 reference transcripts loaded. 51504 query transfrags loaded. mv: cannot stat 'genome.fasta.masked.protref.annotated.gtf': No such file or directory Error: cannot open input file genome.fasta.masked.u.cds.gff! mv: cannot stat 'genome.fasta.masked.k.gff': No such file or directory

Here is STDOUT:

Checking for ufasta on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/ufasta Checking for hisat2 on the PATH... /home/hoffk83/bin/hisat2/hisat2 Checking for minimap2 on the PATH... /home/hoffk83/bin/minimap2/minimap2 Checking for stringtie on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/stringtie Checking for gffread on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/gffread Checking for blastp on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/blastp Checking for tblastn on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/tblastn Checking for makeblastdb on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/makeblastdb Checking for gffcompare on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/gffcompare Checking for TransDecoder.Predict on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/TransDecoder.Predict Checking for TransDecoder.LongOrfs on the PATH... /home/hoffk83/bin/EviAnn-1.0.7/bin/TransDecoder.LongOrfs [Mon Dec 18 12:48:15 PM UTC 2023] Unpacking Uniprot database [Mon Dec 18 12:48:21 PM UTC 2023] Building HISAT2 index [Mon Dec 18 01:38:39 PM UTC 2023] Aligning RNAseq reads [Mon Dec 18 01:40:15 PM UTC 2023] Aligning proteins [Mon Dec 18 01:40:16 PM UTC 2023] Aligning proteins to the genome [Mon Dec 18 01:40:16 PM UTC 2023] Filtering protein alignment file [Mon Dec 18 01:40:34 PM UTC 2023] Running exonerate on the filtered sequences [Mon Dec 18 01:40:34 PM UTC 2023] Sorting alignment files [Mon Dec 18 01:41:02 PM UTC 2023] Assembling transcripts from related species with Stringtie [Mon Dec 18 01:41:08 PM UTC 2023] Deriving gene models from protein and transcript alignments

Here are all produced files:

-rw-r--r-- 1 hoffk83 users 848 Dec 18 13:41 eviann.2430077.node374.err -rw-r--r-- 1 hoffk83 users 46 Dec 18 13:41 genome.fasta.masked.k.gff.tmp -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 genome.fasta.masked.u.gff.tmp -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 genome.fasta.masked.unused_proteins.gff.tmp -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 combine.out -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 genome.fasta.masked.palign.all.gff -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 genome.fasta.masked.transcripts_to_keep.txt -rw-r--r-- 1 hoffk83 users 66M Dec 18 13:41 genome.fasta.masked.protref.combined.gtf -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 genome.fasta.masked.palign.fixed.gff -rw-r--r-- 1 hoffk83 users 1.6K Dec 18 13:41 eviann.2430077.node374.out -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 stringtie.success -rw-r--r-- 1 hoffk83 users 84M Dec 18 13:41 genome.fasta.masked.gtf -rw-r--r-- 1 hoffk83 users 76M Dec 18 13:41 tissue0.bam.sorted.bam.gtf -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:41 sort.success -rw-r--r-- 1 hoffk83 users 96M Dec 18 13:41 tissue0.bam.sorted.bam -rw-r--r-- 1 hoffk83 users 185M Dec 18 13:40 tissue0.filter -rw-r--r-- 1 hoffk83 users 694 Dec 18 13:40 tissue0.header -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:40 genome.fasta.masked.mus_caroli_prot.faa.palign.gff.tmp -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:40 protein2genome.protein_align.success -rw-r--r-- 1 hoffk83 users 27M Dec 18 13:40 mus_caroli_prot.faa.uniq.faa -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:40 align.success -rw-r--r-- 1 hoffk83 users 40M Dec 18 13:40 tissue0.bam -rw-r--r-- 1 hoffk83 users 767 Dec 18 13:40 tissue0.err -rw-r--r-- 1 hoffk83 users 329 Dec 18 13:38 hisat2.sh -rw-r--r-- 1 hoffk83 users 0 Dec 18 13:38 align-build.success -rw-r--r-- 1 hoffk83 users 6.3K Dec 18 13:38 hisat2-build.out -rw-r--r-- 1 hoffk83 users 1.1G Dec 18 13:38 genome.fasta.masked.hst.5.ht2 -rw-r--r-- 1 hoffk83 users 644M Dec 18 13:38 genome.fasta.masked.hst.6.ht2 -rw-r--r-- 1 hoffk83 users 847M Dec 18 13:34 genome.fasta.masked.hst.1.ht2 -rw-r--r-- 1 hoffk83 users 632M Dec 18 13:34 genome.fasta.masked.hst.2.ht2 -rw-r--r-- 1 hoffk83 users 12 Dec 18 12:48 genome.fasta.masked.hst.7.ht2 -rw-r--r-- 1 hoffk83 users 8 Dec 18 12:48 genome.fasta.masked.hst.8.ht2 -rw-r--r-- 1 hoffk83 users 2.4K Dec 18 12:48 genome.fasta.masked.hst.3.ht2 -rw-r--r-- 1 hoffk83 users 632M Dec 18 12:48 genome.fasta.masked.hst.4.ht2 -rw-r--r-- 1 hoffk83 users 536 Dec 18 12:48 uniprot.pjs -rw-r--r-- 1 hoffk83 users 375 Dec 18 12:48 makeblastdb.out -rw-r--r-- 1 hoffk83 users 16K Dec 18 12:48 uniprot.ptf -rw-r--r-- 1 hoffk83 users 1.2M Dec 18 12:48 uniprot.pto -rw-r--r-- 1 hoffk83 users 3.4M Dec 18 12:48 uniprot.pot -rw-r--r-- 1 hoffk83 users 20K Dec 18 12:48 uniprot.pdb -rw-r--r-- 1 hoffk83 users 54M Dec 18 12:48 uniprot.phr -rw-r--r-- 1 hoffk83 users 2.3M Dec 18 12:48 uniprot.pin -rw-r--r-- 1 hoffk83 users 108M Dec 18 12:48 uniprot.psq -rw-r--r-- 1 hoffk83 users 146M Dec 18 12:48 uniprot_sprot.nonred.85.fasta -rw-r--r-- 1 hoffk83 users 40M Dec 18 12:43 mus_caroli_prot.faa -rw-r--r-- 1 hoffk83 users 190M Dec 18 12:43 mus_caroli_transc.fa

Please let me know if you need further details.

— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/EviAnn_release/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHKI36JN5EJK4ESK2YTYKBDF7AVCNFSM6AAAAABAZSAX52VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DMNZRGM2TIOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

KatharinaHoff commented 8 months ago

Giving 1.0.8 a try :-)

KatharinaHoff commented 8 months ago

The run finished (with relative path, v 1.0.8). Thanks for your quick help!

alekseyzimin commented 8 months ago

I am glad it worked! I would be interested in your evaluation of the results, please let me know by email.

On Tue, Dec 19, 2023 at 2:55 AM Katharina Hoff @.***> wrote:

The run finished (with relative path, v 1.0.8). Thanks for your quick help!

— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/EviAnn_release/issues/4#issuecomment-1862279645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHJW3OML77J7LGIVF4LYKFB6NAVCNFSM6AAAAABAZSAX52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRSGI3TSNRUGU . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com