Closed boulund closed 3 years ago
@abhi18av, can you confirm this? If this is the case, it would be great to implement this to save space if the pipeline has completed without errors.
In another nextflow workflow that we use we used hardlinks for all output files to reduce the disk space usage. Whenever everything is done, one can easily remove the work
directory without risking to delete anything in the published directories.
In general, relying on undocumented stuff might bind us to specific version and put us in technical debt overall. Nextflow development moves quite fast.
I think a simpler alternative could be to simply switch the publish mode to move
rather than copy
or link. In this case, as soon as the process exists successfully - it'll move the output of that process to the specified publishDir
.
What do you guys think?
Are there any interaction effects with the scratch
directive that we need to consider then? Thinking of cluster environments where the scratch dir could be on a node-local disk and the publishdir on a shared network file system. (not that BACTpipe produces very large output files, so it's not likely to become a huge problem--people misbehave on these systems all the time anyway :) )
I've note worked extensively with HPC systems but conceptually, I believe they are similar to AWS Batch or Azure Batch environments (multiple synced machines , in which case the publishDir "xyz", mode: "move"
should work :)
In any case, as you mentioned, isn't isn't a huge pipeline so we can iterate a couple times and finalize.
I guess this is still not solved/agreed on, right?
I'm not sure what I think about this anymore. Moving output files to publishDir's is a convenient solution that makes it easy for users to just delete the work dir when they're happy with a run. However, I think it can make troubleshooting a bit more difficult, as all files are no longer present in their work dirs... 🤔
I rarely run BACTpipe in environments where I'm super constrained disk space wise, so the disk space arguments don't really apply to me. Also, it's nice sometimes to find a large old work dir to delete (wow, free disk space!) ;)
I recognise that feeling ;) I am fine with leaving it as it is for now.
Shall we close this for now then, and reopen or create a new issue if we want to raise this in the future?
Let's do so!
According to https://github.com/nextflow-io/nf-hack18/issues/3 there is an undocumented feature to remove the work dir.