Double prefix in meta rule

seb-mueller commented 6 years ago

Having followed the wiki, I ran into an error in the meta rule, i.e. running the following:

snakemake --snakefile ~/analysis/dropseq/software/dropSeqPipe/Snakefile meta

got me this error at some point:

rule create_intervals:
    input: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.reduced.gtf, /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa.dict
    output: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa.rRNA.intervals
    jobid: 1
    wildcards: reference_prefix=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa

Finished job 4.
3 of 6 steps (50%) done
Error in rule create_intervals:
    jobid: 1
    output: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa.rRNA.intervals

RuleException:
CalledProcessError in line 64 of /home/user/analysis/dropseq/software/dropSeqPipe/rules/generate_meta.smk:
Command ' set -euo pipefail;  ~/analysis/dropseq/software/Drop-seq_tools-1.13/drop-seq-tools-wrapper.sh -m 20g -p CreateIntervalsFiles      REDUCED_GTF=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.reduced.gtf SEQUENCE_DICTIONARY=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa.dict     O=/home/user/analysis/dropseq/data/mixed        PREFIX=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa ' returned non-zero exit status 1.
  File "/home/user/analysis/dropseq/software/dropSeqPipe/rules/generate_meta.smk", line 64, in __rule_create_intervals
  File "/home/user/.conda/envs/dropSeqPipe/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Terminating processes on user request.
Cancelling snakemake on user request.

Running the failed command manually:

~/analysis/dropseq/software/Drop-seq_tools-1.13/drop-seq-tools-wrapper.sh -m 20g -p CreateIntervalsFiles  \
REDUCED_GTF=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.reduced.gtf \
SEQUENCE_DICTIONARY=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa.dict  \
O=/home/user/analysis/dropseq/data/mixed  \
PREFIX=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa

got me this:

....
at org.broadinstitute.dropseqrna.annotation.CreateIntervalsFiles.write(CreateIntervalsFiles.java:215)
at org.broadinstitute.dropseqrna.annotation.CreateIntervalsFiles.doWork(CreateIntervalsFiles.java:160)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
Caused by: java.io.FileNotFoundException: /home/user/analysis/dropseq/data/mixed/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa.genes.intervals (No such file or directory)
....

The path seems to be appended twice as path. This might be a config error on my side, though digging a bit deeper and changing the manual command from:

PREFIX=/home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fa

to

PREFIX=hg19_mm10_transgenes.fa

seems to do the trick as a workaround. The culprit seems in Drop-seq_tools-1.13/public/src/java/org/broadinstitute/dropseqrna/annotation/CreateIntervalsFiles.java line 219:

private File makeIntervalFile(final String intervalType) {
    return new File(OUTPUT, PREFIX + "." + intervalType + ".intervals");
}

Where it probably uses the dot as an additional path on top of the prefix. I haven't complete understood the code but maybe the following line needs changing (but might be completely wrong)?:

https://github.com/Hoohm/dropSeqPipe/blob/1fe59178579ec373c5224001d0faf9ce055c5308/rules/generate_meta.smk#L68

Sorry for the lengthy report, hope it's not too confusing. Happy to provide more details.

Hoohm commented 6 years ago

@seb-mueller Thanks for the issue. There is one thing I know will probably fix your issue is renaming the reference from .fa to .fasta. I haven't implemented something better than harcoding the extension in the code right now. Maybe I should make it compatible for either fasta or fa.

seb-mueller commented 6 years ago

It's still giving the same error after renaming the fa to fasta (I've started from a clean directory containing only the fasta and gtf to make sure it's not due to tmp files).

Hoohm commented 6 years ago

Interesting. I'm gonna test it myself too. Have you cloned the rep lately or just when it came out? I know I had the same issue a few commits back.

seb-mueller commented 6 years ago

I cloned it a few days ago and are now on the latest commit (1fe59..). Also I've activate the conda environment as described in the wiki and I've just in case I've copied my config.yaml below:

LOCAL:
    TMPDIR: ./tmp
    DROPSEQ-wrapper: ~/analysis/dropseq/software/Drop-seq_tools-1.13/drop-seq-tools-wrapper.sh
    MEMORY: 20g
META:
    species:
      - HUMAN
      - MOUSE
    species_ratio: 0.20
    reference_file: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fasta
    annotation_file: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.gtf
    reference_folder: /home/user/analysis/dropseq/data/mixed
FILTER:
    5PrimeSmartAdapter: CACACTCTTTCCCTACACGACGC
    Cell_barcode:
        start: 1
        end: 12
        min_quality: 30
        num_below_quality: 0
    UMI:
        start: 13
        end: 20
        min_quality: 30
        num_below_quality: 0
    IlluminaClip: TruSeq3-PE.fa
EXTRACTION:
    bc_edit_distance: 0
    min_count_per_umi: 1
STAR_PARAMETERS:
    outFilterMismatchNmax: 10
    outFilterMismatchNoverLmax: 0.3
    outFilterMismatchNoverReadLmax: 1
    outFilterMatchNmin: 0

Hoohm commented 6 years ago

Might have found out the problem. You don't need the full path for the reference files, only the filename. Try that out

On Fri, Jan 26, 2018, 11:55 seb-mueller notifications@github.com wrote:

I cloned it a few days ago and are now on the latest commit (1fe59..). Also I've activate the conda environment as described in the wiki and I've just in case I've copied my config.yaml below:

LOCAL: TMPDIR: ./tmp DROPSEQ-wrapper: ~/analysis/dropseq/software/Drop-seq_tools-1.13/drop-seq-tools-wrapper.sh MEMORY: 20g META: species:

HUMAN

MOUSE species_ratio: 0.20 reference_file: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.fasta annotation_file: /home/user/analysis/dropseq/data/mixed/hg19_mm10_transgenes.gtf reference_folder: /home/user/analysis/dropseq/data/mixed FILTER: 5PrimeSmartAdapter: CACACTCTTTCCCTACACGACGC Cell_barcode: start: 1 end: 12 min_quality: 30 num_below_quality: 0 UMI: start: 13 end: 20 min_quality: 30 num_below_quality: 0 IlluminaClip: TruSeq3-PE.fa EXTRACTION: bc_edit_distance: 0 min_count_per_umi: 1 STAR_PARAMETERS: outFilterMismatchNmax: 10 outFilterMismatchNoverLmax: 0.3 outFilterMismatchNoverReadLmax: 1 outFilterMatchNmin: 0

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Hoohm/dropSeqPipe/issues/16#issuecomment-360750663, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNXaO13iykLnNZNR84xdrSn32xSVnwYks5tOa7ugaJpZM4RtEhf .

seb-mueller commented 6 years ago

That worked! Slightly embarrassing since this looks rather obvious retrospectively. I think I copied those from the 0.23 version which I believe required the full path, it's correctly depicted in the wiki though. Thanks for the quick help!

Hoohm commented 6 years ago

@seb-mueller Happy to help. Quick question, where did you find the source code from dropseqtoolsv1.13? Did i miss it somewhere or did you reverse the jar file?

seb-mueller commented 6 years ago

It's all the bundled zip file, specifically in this subdirectory Drop-seq_tools-1.13/public/src/java/org/broadinstitute/dropseqrna/ Could you find it?

Hoohm commented 6 years ago

Yep, now I feel stupid :) Thanks.

seb-mueller commented 6 years ago

In all fairness, it's well buried in multiple subdirs plus it's Friday ;)

Hoohm / dropSeqPipe

Double prefix in meta rule #16