snakemake pipeline error in rule depletion

jb013b commented 6 years ago

I am running into an error with the snakemake pipeline installed with easy script. right now I am running just one sample converted from fastq to uBAM using picard tools. I have attached the snakemake error logs as well as readout of the command line input and output. Before starting the snakemake pipeline the snakemake -np works.

180103log.txt 2018-01-03T131752.103549.snakemake.log 2018-01-03T131752.833254.snakemake.log

Thank you James

tomkinsc commented 6 years ago

From a quick glance at the logs, I'm guessing the issue is that the databases are specified with trailing slashes in the config file. They are intended to be file path prefixes: the containing directory path and then the common prefix of all files in the directory. For example, specifying: /media/jb013b/Seagate/snakemake_database/hg19 means that the following files should exist:

- /media/jb013b/Seagate/snakemake_database/hg19.bitmask
- /media/jb013b/Seagate/snakemake_database/hg19.srprism.ssa
[...and so on...]

For the hg19 database, the value you have set appears correct. For the others, the trailing slash should be removed; ex.: /media/jb013b/Seagate/snakemake_database/GRCh37.68/ should be: /media/jb013b/Seagate/snakemake_database/GRCh37.68 (and so on for the other databases specified in the config file).

jb013b commented 6 years ago

I am having additional problems with rule depletion. I have attached the log file. based upon the snakemake -np I fixed the database problems. I also ran a few files through in case one is bad. No luck there.

Thank you for your help. James

180222snakefail.docx 2018-02-23T132525.634075.snakemake.log

tomkinsc commented 6 years ago

Apologies for that; we're still making changes to bwa-based depletion and have not yet rolled it out fully. I suggest you download the hg19 database tarball (link), extract it (tar -xvf hg19.tar.gz), and change the Snakemake config.yaml file to use the local path prefix. A bugfix to use the remote version should be in the next release of viral-ngs.

Be sure to remove the other databases from the bwa section of the config as well (they'll still be removed via bmtagger if they're listed in that block):

bwa_dbs_remove:
  - "/local/path/to/hg19"
  # (no other bwa depletion databases; only hg19)

jb013b commented 6 years ago

Thank you Chris. Unfortunately now I get a new error, pasted below. I also attached the config.yaml file.

(viral-ngs)bash:jb013b-OptiPlex-9020:~/NGS/viral-ngs-etc/projects/CHIKV1931 647 $ snakemake
Building DAG of jobs...
File path reports/fastqc//align_to_self contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
MissingInputException in line 108 of /home/jb013b/NGS/viral-ngs-etc/projects/CHIKV1931/bin/pipes/rules/interhost.rules:
Missing input files for rule multi_align_mafft:
data/02_assembly/.fasta
(viral-ngs)bash:jb013b-OptiPlex-9020:~/NGS/viral-ngs-etc/projects/CHIKV1931 648 $

config.txt

tomkinsc commented 6 years ago

Have you populated the samples-depletion.txt and samples-assembly.txt, and samples-assembly.txt files listed in the documentation? They should contain sample names, one per line, to match the names of the bam files in data/00_raw.

jb013b commented 6 years ago

Chris, I also substituted your lines, same results.

bwa_dbs_remove:
  - "/media/jb013b/Seagate/snakemake_database/hg19"
  # (no other bwa depletion databases; only hg19)

jb013b commented 6 years ago

I populated the folders samples-depletion.txt and samples-assembly.txt, and samples-runs.txt files with the file names. (pasted below), that match the bam files data/00_raw. I did not populate the metagenomics folder. I converted them using picard (also pasted below)

file names DMSO_control_20_1_fastqtosam parent_CHIKV_S8_unmapped_fastqtosam parent_CHIKV_S8_fastqtosam parent-CHIKV_S8.CHIKV_MMUL.unmapped

java -Xmx8G -jar /home/jb013b/NGS/viral-ngs-etc/conda-env/share/picard-2.17.6-0/picard.jar FastqToSam FASTQ=parent-CHIKV_S8.CHIKV_MMUL.unmapped.sorted.R1.fq FASTQ2=parent-CHIKV_S8.CHIKV_MMUL.unmapped.sorted.R2.fq OUTPUT=parent_CHIKV_S8_unmapped_fastqtosam.bam READ_GROUP_NAME=GW170630292.2 SAMPLE_NAME=parent_CHIKV LIBRARY_NAME=Solexa-272222 PLATFORM_UNIT=GW170630292:1 PLATFORM=illumina SEQUENCING_CENTER=gw RUN_DATE=2017-06-33

tomkinsc commented 6 years ago

Do you have any blank lines in the samples-*.txt files? If so, try removing them.

jb013b commented 6 years ago

Chris, I went and made sure I did not have blank spaces. In the case with a single file, there is one name on the first line and no other lines. I attached the shell output and the snakemake error log. Let me know if there are other files or better info to send to you. Thank you james

2018_02_22shell.txt 2018-02-23T165121.143519.snakemake.log

broadinstitute / viral-ngs

snakemake pipeline error in rule depletion #748