Rerunning "extract" step

ravipatel4 commented 6 years ago

Hello,

I am having trouble rerunning "extract" step successfully. The description of knee plot on the "Plot" page suggests to rerun the "extract" step after changing the expected_cells parameter in samples.csv, if the clear bend of the curve is higher than the expected_cell parameter. I tried the following command for this after increasing the expected_cells parameter. snakemake --cores 8 extract --use-conda

However, it doesn't rerun the extract step and execution ends with a message saying "Nothing to be done". Below are the messages printed on the console. Am I doing anything wrong in rerunning the extract step only?

Building DAG of jobs... Nothing to be done. Shutting down, this might take some time. Complete log: /local/.....

Thank you.

Hoohm commented 6 years ago

Hello @ravipatel4 ! I haven't added this on the wiki yet but it can be found on the snakemake FAQ

The easy way I am doing it, is by simply deleting the files I know will be impacted and rerunning the step.

However, this requires some specific knowledge of the rules involved, etc... For your example, you have to delete:

'summary/{sample}_umi_expression_matrix.tsv'
'summary/{sample}_counts_expression_matrix.tsv'
'logs/{sample}_umi_per_gene.tsv'
'logs/{sample}_rna_metrics.txt'

This will rerun the steps necessary and should also rerun the plots even though you haven't deleted them because they are dependent on those files.

The clean way to do it is the following: snakemake -R `snakemake extract --list-params-changes` The problem is, this will actually rerun everything up to the STAR index. The main reason is that snakemake can't discriminate which specific value has changed. It just knows, samples.csv has changed, so I update all that is dependent on that.

If you change something in the config.yaml however, snakemake will load up each value separately and only update what has been affected by the value you changed.

So, for the time being, I advise the deleting solution presented first.

Hope that helps clearing thigs up

ravipatel4 commented 6 years ago

Thank you @Hoohm for the detailed description.

I followed your instructions for the "easy" way of rerunning extract step. I deleted the files you mentioned and reran the extract step as snakemake --cores 30 extract --use-conda The command ran fine. However, it created only logs/{sample}_umi_per_gene.tsv files. How do I get the other deleted files. Did I miss anything?

Thank you.

ravipatel4 commented 6 years ago

I extracted following command from the verbose output of my full run of the pipeline: java -Xmx4g -Djava.io.tmpdir=/tmp/ -jar /sources/drop-seq-tools/Drop-seq_tools-1.13/jar/dropseq.jar DigitalExpression I=data/smpl_final.bam O=summary/smpl_counts_expression_matrix.tsv EDIT_DISTANCE=1 OUTPUT_READS_INSTEAD=true NUM_CORE_BARCODES=500 MIN_BC_READ_THRESHOLD=1 I believe running this command alone (after changing NUM_CORE_BARCODES to 1000) would give me a count expression matrix for 1000 cells. Do you think this command alone is enough for what I want to do? Thanks.

Hoohm commented 6 years ago

Oh, sorry, I made a mistake. You need to delete also the final expected files.

        expand('logs/{sample}_umi_per_gene.tsv', sample=samples.index),
        expand('plots/{sample}_rna_metrics.pdf', sample=samples.index),
        'summary/umi_expression_matrix.tsv',
        'summary/counts_expression_matrix.tsv'
        expand('logs/{sample}_hist_out_cell.txt', sample=samples.index),
        expand('plots/{sample}_knee_plot.pdf', sample=samples.index),

If you delete those, it should be fine

Hoohm commented 6 years ago

Hello @ravipatel4 Did it help you rerun the extract?

ravipatel4 commented 6 years ago

Hello @Hoohm,

I think it worked. But I ended up using the following command after changing the number of cells to what I want the expression data for, and it looked like working how I wanted it to. java -Xmx4g -Djava.io.tmpdir=/tmp/ -jar /sources/drop-seq-tools/Drop-seq_tools-1.13/jar/dropseq.jar DigitalExpression I=data/smpl_final.bam O=summary/smpl_counts_expression_matrix.tsv EDIT_DISTANCE=1 OUTPUT_READS_INSTEAD=true NUM_CORE_BARCODES=500 MIN_BC_READ_THRESHOLD=1

Thank you.

Best, Ravi

Hoohm commented 6 years ago

Ok! I'll close this then

Hoohm / dropSeqPipe

Rerunning "extract" step #28