allind / EukDetect

MIT License
40 stars 15 forks source link

eukdetect test fails at aln stage #13

Open emilyvansyoc opened 3 years ago

emilyvansyoc commented 3 years ago

Hello - I followed the conda install instructions and edited the yml config file. The test fails at aln stage and the aln folder in the output directory is empty.

Here is the yml file:

Config file for testing eukdetect. Edit your paths where specified

eukdetect_dir: "/storage/home/epb5360/scratch/EukDetect" output_dir: "/storage/home/epb5360/scratch/EukDetect/results" #directory where output should be written

paired_end: true #true or false

fwd_suffix: "_R1.fastq.gz" #filename excluding sample name. no need to edit if paired_end= false rev_suffix: "_R2.fastq.gz" #filename excludign sample name. no need to edit if paired_end = false se_suffix: ".fastq.gz" #file name excluding sample name. no need to edit if paired_end = true readlen: 125 #targeted length of your reads. pre-trimming reads not recommended

fq_dir: "/storage/home/epb5360/scratch/EukDetect/tests" #full path to directory with raw fastq files database_dir: "/storage/home/epb5360/scratch/EukDetect/eukdb" #full path to folder with all eukdetect_db files and taxa.sqlite files database_prefix: "ncbi_eukprot_met_arch_markers.fna" #database prefix

samples: #list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE test:

Here is the snakemake error log: $ cat snakemake_1616682088.517935.log Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 bam2fastq 1 countreads 1 find_low_complexity 1 fixmate 1 index 1 markdup 1 remove_low_complexity 1 rmsort 1 runall 1 runaln 1 taxonomize 11

[Thu Mar 25 10:21:28 2021] rule runaln: input: /storage/home/epb5360/scratch/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna, /storage/home/epb5360/scratch/EukDetect/tests/test_R1.fastq.gz, /storage/home/epb5360/scratch/EukDetect/tests/test_R2.fastq.gz output: /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam jobid: 1 wildcards: output_dir=/storage/home/epb5360/scratch/EukDetect/results, sample=test

Job counts: count jobs 1 runaln 1 open: No such file or directory [bam_sort_core] fail to open file /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam [samopen] SAM header is present: 521824 sequences. [Thu Mar 25 10:21:35 2021] Error in rule runaln: jobid: 0 output: /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam

RuleException: CalledProcessError in line 78 of /storage/home/epb5360/scratch/EukDetect/rules/eukdetect.rules: Command ' set -euo pipefail; bowtie2 --quiet --omit-sec-seq --no-discordant --no-unal -x /storage/home/epb5360/scratch/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna -1 /storage/home/epb5360/scratch/EukDetect/tests/test_R1.fastq.gz -2 /storage/home/epb5360/scratch/EukDetect/tests/test_R2.fastq.gz | perl -lane '$l = 0; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 100.0 or /^@/' | samtools view -q 30 -bS - | samtools sort -o /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam - ' returned non-zero exit status 141. File "/storage/home/epb5360/scratch/EukDetect/rules/eukdetect.rules", line 78, in __rule_runaln File "/storage/work/epb5360/miniconda3/envs/eukdetect/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /gpfs/scratch/epb5360/EukDetect/.snakemake/log/2021-03-25T102128.731736.snakemake.log

allind commented 3 years ago

Thanks for reaching out. Can you try running just the command itself, that is:

bowtie2 --quiet --omit-sec-seq --no-discordant --no-unal -x /storage/home/epb5360/scratch/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna -1 /storage/home/epb5360/scratch/EukDetect/tests/test_R1.fastq.gz -2 /storage/home/epb5360/scratch/EukDetect/tests/test_R2.fastq.gz | perl -lane '$l = 0; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 100.0 or /^@/' | samtools view -q 30 -bS - | samtools sort -o /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam -

and seeing if that causes any errors, and post them here? Thank you.

emilyvansyoc commented 3 years ago

Thanks for the quick reply! That ran with no errors, and in the 'results' folder there is this file: test_aln_q30_lenfilter.sorted.bam

allind commented 3 years ago

Great, thanks. Could you share your config file?

emilyvansyoc commented 3 years ago

I should mention that I'm working on an HPC cluster, not sure if that is part of the install problem as I've had issues with bowtie2 when installing metaphlan and humann

$ cat configfile_for_tests.yml

Config file for testing eukdetect. Edit your paths where specified

eukdetect_dir: "/storage/home/epb5360/scratch/EukDetect" output_dir: "/storage/home/epb5360/scratch/EukDetect/results" #directory where output should be written

paired_end: true #true or false

fwd_suffix: "_R1.fastq.gz" #filename excluding sample name. no need to edit if paired_end= false rev_suffix: "_R2.fastq.gz" #filename excludign sample name. no need to edit if paired_end = false se_suffix: ".fastq.gz" #file name excluding sample name. no need to edit if paired_end = true readlen: 125 #targeted length of your reads. pre-trimming reads not recommended

fq_dir: "/storage/home/epb5360/scratch/EukDetect/tests" #full path to directory with raw fastq files database_dir: "/storage/home/epb5360/scratch/EukDetect/eukdb" #full path to folder with all eukdetect_db files and taxa.sqlite files database_prefix: "ncbi_eukprot_met_arch_markers.fna" #database prefix

samples: #list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE test:

allind commented 3 years ago

I see, okay. It sounds like this problem might not be coming from eukdetect itself. Are you running this as a snakemake pipeline and submitting that as a job to the cluster? When you run your job on the cluster do you have a step where you run "conda activate eukdetect" (or whatever you've named the environment)?

emilyvansyoc commented 3 years ago

Sorry for the delayed reply - I re-visited this after fixing the bowtie issue in other softwares but still having issues here. I'm running it locally right now in the eukdetect conda environment

emilyvansyoc commented 3 years ago

In other news, I tried it on my local machine and am running into issues with the conda environment:

$ conda env update --name eukdetect -f environment.yml Collecting package metadata (repodata.json): done Solving environment: failed

ResolvePackageNotFound:

allind commented 3 years ago

It's possible this is coming from an issue with versioning between operating system channels on conda. Is your local machine running a different operating system than Linux? Eukdetect has not been tested on OSX or Windows.

emilyvansyoc commented 2 years ago

I've done more troubleshooting on this: I ran each command in the eukdetect.rules file by hand and they ran successfully, so the problem is somewhere in the snakemake process? Here is the snakemake log, it fails on the first step:

$ snakemake --snakefile rules/eukdetect.rules --configfile config.yml --cores 4 aln Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 aln 1 runaln 2

[Tue Sep 14 15:20:56 2021] rule runaln: input: /gpfs/group/evk5387/default/emily/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna, /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R1.fastq.gz, /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R2.fastq.gz output: /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam jobid: 1 wildcards: output_dir=/gpfs/group/evk5387/default/emily/EukDetect/retest, sample=test

Job counts: count jobs 1 runaln 1 open: No such file or directory [bam_sort_core] fail to open file /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam [samopen] SAM header is present: 521824 sequences. [Tue Sep 14 15:21:02 2021] Error in rule runaln: jobid: 0 output: /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam

RuleException: CalledProcessError in line 78 of /gpfs/group/evk5387/default/emily/EukDetect/rules/eukdetect.rules: Command ' set -euo pipefail; bowtie2 --quiet --omit-sec-seq --no-discordant --no-unal -x /gpfs/group/evk5387/default/emily/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna -1 /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R1.fastq.gz -2 /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R2.fastq.gz | perl -lane '$l = 0; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 101.0 or /^@/' | samtools view -q 30 -bS - | samtools sort -o /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam - ' returned non-zero exit status 141. File "/gpfs/group/evk5387/default/emily/EukDetect/rules/eukdetect.rules", line 78, in __rule_runaln File "/storage/work/epb5360/miniconda3/envs/eukdetect/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /gpfs/group/evk5387/default/emily/EukDetect/.snakemake/log/2021-09-14T152056.475674.snakemake.log

allind commented 2 years ago

Could you post the most recent version of the config file you're using?

emilyvansyoc commented 2 years ago

Here it is below (I'm using the test samples)

$ cat config.yml

Default config file for eukdetect. Copy and edit for analysis

Directory where EukDetect output should be written

output_dir: "/gpfs/group/evk5387/default/emily/EukDetect/retest"

Indicate whether reads are paired (true) or single (false)

paired_end: true

filename excluding sample name. no need to edit if paired_end = false

fwd_suffix: "_R1.fastq.gz"

filename excludign sample name. no need to edit if paired_end = false

rev_suffix: "_R2.fastq.gz"

file name excluding sample name. no need to edit if paired_end = true

se_suffix: ".fastq.gz"

length of your reads. pre-trimming reads not recommended

readlen: 126

full path to directory with raw fastq files

fq_dir: "/gpfs/group/evk5387/default/emily/EukDetect/tests"

full path to folder with eukdetect database files

database_dir: "/gpfs/group/evk5387/default/emily/EukDetect/eukdb"

name of database. Default is original genomes only database name

database_prefix: "ncbi_eukprot_met_arch_markers.fna"

full path to eukdetect installation folder

eukdetect_dir: "/gpfs/group/evk5387/default/emily/EukDetect"

list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE

each sample name should be preceded by 2 spaces and followed by a colon character

samples: test:

emilyvansyoc commented 2 years ago

Good news, I've figured out a workaround and have a better idea of what's happening on the cluster... snakemake is not inheriting the conda environment in the shell I'm executing it from. I was able to get everything running by adding conda activate eukdetect to my bashrc file. Hopefully there is a snakemake command to add to the config yaml to fix this?

allind commented 2 years ago

Great! Thanks for the update. This is not likely something that will be added to the workflow. Do you have the option of submitting multi line commands to the cluster, or multi line job scripts, where you can add the line "conda activate eukdetect" before snakemake is called? This is how I work with conda environments on a cluster.