Closed Vlad-Dembrovskyi closed 3 years ago
This is an absolute interesting PR and learning, as it implements the hidden part of Nextflow. Thank you for digging deep to make it work 🦾
Just a suggestion, Can we have a message with
workflow.onComplete
scope to let the user know it's cleaned the work directory after workflow completion. Because, when running in HPC (where this functionality will be mostly used) generatesstdout
to a file for checking on a later stage. Having this message on this stdout file will give the user more clarity, what has happened after the job run.Also as a bonus, if we can point to work directory path (
$workflow.workDir
) in the message would be great.
@sk-sahu Added with latest commit.
To test successful notification just run the example command with ultra_quick_test and docker profiles. To test the failed pipeline message introduce an error to latest step (for example touch folder in non-existing file touch ghtjf/vsuyr
) and run same command.
@Vlad-Dembrovskyi one suggestion - we rarely run the pipeline from the command line. We generally submit a standard main.pbs and a config file. Is the a way to add the cleanup param to NF_splicing_pipeline.config?
Hi @angarb. For sure you can add this cleanup option to any config you use to run the pipeline. For that you just need to add this line to the very end of the config that you use:
cleanup = true
I can't find the NF_splicing_pipeline.config
in this repository, so I assume its a config you are using in your working environment. So you have to add the line yourself. But it is as I said as easy as to copy paste the line above to the end of the config. That should work :) Having this line in a config will set the cleanup to happen. Let me know if that works for you.
@angarb it would be good to add this configuration file NF_splicing_pipeline.config to the GitHub repository - as the sumner
configuration requiirements.
Hi @adeslatt we do have this config in the repository - the example is here (splicing-pipelines-nf/conf/examples/MYC_MCF10A_0h_vs_MYC_MCF10A_8h.config) and parameter descriptions here (https://github.com/TheJacksonLaboratory/splicing-pipelines-nf/blob/master/docs/usage.md#all-available-parameters)
@PhilPalmer had outlined these steps when updating a parameter:
@Vlad-Dembrovskyi - should I just add cleanup to these param lists?
@angarb I didn't know the MYC_MCF10A_0h_vs_MYC_MCF10A_8h.config file is the same as NF_splicing_pipeline.config
which @adeslatt referred to, sorry.
Anyhow. Yes, you can add --cleanup
parameter with description from here to all-available-parameters part, under the --debug
option.
And you can also add it as cleanup = true
to your config file, but important: not inside the params
scope, but outside of it. Just add it as the last line of your config file as if there was nothing else in your config file. It is an isolated standalone nextflow option when you provide it with a config file.
Description
This PR addresses the issue #217 that requires an option to cleanup the
work
folder in the end of pipeline execution, as it often contains a lot of data for a large-scale run that occupies many Gigabytes, even up to Terabytes of disc space.Note: Before merging current PR to
dev
branch, merge #239 to current branch. [done]Solution
Luckily enough, nextflow has a hidden undocumented feature to cleanup all temporary files that Paolo Di Tommaso revealed in one of his nf-kacks. There was a problem with it - it failed to cleanup files for our pipeline when running with Docker profile with a non-descriptive groovy error message
Failed to cleanup work dir: ...
(but pipeline still worked fine). The reason was groovy couldn't delete files owned by a root user in work folder. Such files are created by default by all processes that run through a docker container.I solved the issue by adding specific config docker options to set user and user group to current user
docker.runOptions = '-u $(id -u):$(id -g)'
. After that the newcleanup
option started working like a charm. It is exposed as a parameter--cleanup
and can be used simply asnextflow run ... --cleanup true
, or even as a flagnextflow run ... --cleanup
. Singularity profile does not have user ownership issues thus worked fine out of the box (tested in #239, has to be merged to current PR before this one is merged to dev [done]).Note: this option cleans all the workdirs of all processes, but doesn't clean the staged files from
stage
folder. Even despite that this is already a tremendous win in disc space.To test
To test that new feature is actually working run the pipeline in ultra quick test mode with and w/o
--cleanup
option enabled, and check thework
folder sizes in both cases.nextflow run . -profile ultra_quick_test,docker du -sh work rm -r work
nextflow run . -profile ultra_quick_test,docker --cleanup du -sh work du -sh work/*
nextflow run . -profile ultra_quick_test,singularity_local --cleanup du -sh work du -sh work/*
git clone https://github.com/TheJacksonLaboratory/splicing-pipelines-nf cd splicing-pipelines-nf git checkout adds-workdir-cleanup-option nextflow run . -profile ultra_quick_test,sumner --cleanup du -sh work/*