Closed ravipatel4 closed 6 years ago
Hello @ravipatel4 ! I haven't added this on the wiki yet but it can be found on the snakemake FAQ
The easy way I am doing it, is by simply deleting the files I know will be impacted and rerunning the step.
However, this requires some specific knowledge of the rules involved, etc... For your example, you have to delete:
'summary/{sample}_umi_expression_matrix.tsv'
'summary/{sample}_counts_expression_matrix.tsv'
'logs/{sample}_umi_per_gene.tsv'
'logs/{sample}_rna_metrics.txt'
This will rerun the steps necessary and should also rerun the plots even though you haven't deleted them because they are dependent on those files.
The clean way to do it is the following: snakemake -R `snakemake extract --list-params-changes`
The problem is, this will actually rerun everything up to the STAR index. The main reason is that snakemake can't discriminate which specific value has changed. It just knows, samples.csv has changed, so I update all that is dependent on that.
If you change something in the config.yaml however, snakemake will load up each value separately and only update what has been affected by the value you changed.
So, for the time being, I advise the deleting solution presented first.
Hope that helps clearing thigs up
Thank you @Hoohm for the detailed description.
I followed your instructions for the "easy" way of rerunning extract step. I deleted the files you mentioned and reran the extract step as snakemake --cores 30 extract --use-conda
The command ran fine. However, it created only logs/{sample}_umi_per_gene.tsv
files. How do I get the other deleted files. Did I miss anything?
Thank you.
I extracted following command from the verbose output of my full run of the pipeline:
java -Xmx4g -Djava.io.tmpdir=/tmp/ -jar /sources/drop-seq-tools/Drop-seq_tools-1.13/jar/dropseq.jar DigitalExpression I=data/smpl_final.bam O=summary/smpl_counts_expression_matrix.tsv EDIT_DISTANCE=1 OUTPUT_READS_INSTEAD=true NUM_CORE_BARCODES=500 MIN_BC_READ_THRESHOLD=1
I believe running this command alone (after changing NUM_CORE_BARCODES to 1000) would give me a count expression matrix for 1000 cells. Do you think this command alone is enough for what I want to do?
Thanks.
Oh, sorry, I made a mistake. You need to delete also the final expected files.
expand('logs/{sample}_umi_per_gene.tsv', sample=samples.index),
expand('plots/{sample}_rna_metrics.pdf', sample=samples.index),
'summary/umi_expression_matrix.tsv',
'summary/counts_expression_matrix.tsv'
expand('logs/{sample}_hist_out_cell.txt', sample=samples.index),
expand('plots/{sample}_knee_plot.pdf', sample=samples.index),
If you delete those, it should be fine
Hello @ravipatel4 Did it help you rerun the extract?
Hello @Hoohm,
I think it worked. But I ended up using the following command after changing the number of cells to what I want the expression data for, and it looked like working how I wanted it to.
java -Xmx4g -Djava.io.tmpdir=/tmp/ -jar /sources/drop-seq-tools/Drop-seq_tools-1.13/jar/dropseq.jar DigitalExpression I=data/smpl_final.bam O=summary/smpl_counts_expression_matrix.tsv EDIT_DISTANCE=1 OUTPUT_READS_INSTEAD=true NUM_CORE_BARCODES=500 MIN_BC_READ_THRESHOLD=1
Thank you.
Best, Ravi
Ok! I'll close this then
Hello,
I am having trouble rerunning "extract" step successfully. The description of knee plot on the "Plot" page suggests to rerun the "extract" step after changing the expected_cells parameter in samples.csv, if the clear bend of the curve is higher than the expected_cell parameter. I tried the following command for this after increasing the expected_cells parameter. snakemake --cores 8 extract --use-conda
However, it doesn't rerun the extract step and execution ends with a message saying "Nothing to be done". Below are the messages printed on the console. Am I doing anything wrong in rerunning the extract step only?
Building DAG of jobs... Nothing to be done. Shutting down, this might take some time. Complete log: /local/.....
Thank you.