Closed seb-mueller closed 4 years ago
Addition: I've created a view of the DAG and circled the rules that depend the expected_cells
parameter:
Added this branch which seems to fix the issue: https://github.com/Hoohm/dropSeqPipe/tree/feature/split_species_fix
Suggest workflow:
expected_cells
in samples.csv
top_barcodes.csv
In zsh
this can be done like this ls **/*top_barcodes.csv | xargs -t -I{} -p mv {} {}.bak
snakemake --use-conda -rp --conda-prefix ~/.conda/myevns --cores 32 --directory ~/analysis/dropseq/data/project
Update, @Hoohm mentioned the --notemp
flag of snakemake, which can be used to prevent deleting temporay files.
In that case I could undo this line https://github.com/Hoohm/dropSeqPipe/blob/63b3dc520fa4fb54f37c539220fa554de03437dd/rules/map.smk#L99 marking it as temp again.
However all the other changes are still valid in my mind, @Hoohm, could you check this?
Yep, I'm closing this
I often need to change the
expected_cells
insamples.csv
and want snakemake to update all result depending on it, but no more than that.This seems common and I think we need a clean solution for this. It has been raised already in issue #28, but since the pipeline has changed and I think that solution is not valid anymore, I'd rather have a separate issue.
To approach this systematically, I first tried to find out when
expected_cells
is read by rip-greping the code base:Results (Note, the lines with hits have a green highlighted number with colon):
Checking the DAG, I found that
rule get_top_barcodes
was the highest up, so figured if I delete it's output, this should be rerun and consequently everything down the line.https://github.com/Hoohm/dropSeqPipe/blob/622be9464bf141d0b6803f85178e653ecca5e3ea/rules/cell_barcodes.smk#L31
This in fact worked well by deleting
'{results_dir}/samples/{sample}/top_barcodes.csv'
and rerunning snakemake :snakemake --use-conda -rp --conda-prefix ~/.conda/myevns --cores 32 --directory ~/analysis/dropseq/data/project
Note, at this point I've modified the
Snakefile
to have'{results_dir}/samples/{sample}/top_barcodes.csv'
in the target rules!But STAR-mapping seems to be triggered again! Surely that's a wasteful way to do it. I worked out this is because of the
repair_barcodes
rule: https://github.com/Hoohm/dropSeqPipe/blob/622be9464bf141d0b6803f85178e653ecca5e3ea/rules/cell_barcodes.smk#L60 which relies onexpected_cells
, barcode information as well but also ontemp('{results_dir}/samples/{sample}/Aligned.repaired.bam')
which is only generated temporary! That's why the mapping gets probably kicked of to regenerated.To overcome this, I've created a branch where this should be fixed by taking out the temp bit, I'm testing at the moment and get back. Hope I didn't miss anything really simple :)
Note