Closed bentsherman closed 2 years ago
Hi @bentsherman this is awesome you've started this. I've been working on updating AnnoTater (now EnTAP-nf) to be DSL2 complaint as well. So this will be a wild ride for GEMmaker if we reorganize into modules.
Another day in the bag. The pipeline script is now much easier to read. Some remaining issues:
multiqc
and create_gem
are not executed after all samples are processedclean_*
processes are working correctlyclose()
is deprecated under DSL 2Some future ideas that may or may not make it into this PR:
clean_*
process given source channel and done signals, DONE_SENTINEL
constant in place of val(1)
clean_*
processes to clean_work_files
and clean_work_dirs
since they are now identical, all the ready signals can be piped to the same processI figured out how to deal with the infinite loop caused by Channel.watchPath()
. Before, we could just close the channel explicitly, but now the close()
method is deprecated. So instead, next_sample
writes a "done" file into the process
folder, and then the watcher channel closes upon reading this file via until()
. The watcher is guaranteed to see this file only after it has seen all other samples. Very handy!
A nice thing is that I was able to move the code from write_stage_files
and start_first_batch
into the workflow block since it is just groovy code, so that should remove a little bit of overhead. You could do a similar thing with next_sample
but I don't think you could guarantee that only one sample is transitioned at a time, and I couldn't find anything in Groovy for doing atomics or mutexes so I'll just leave it be.
I was able to condense all of the cleaning processes into one for files and one for directories. I also tried to make a function for the repetitive cleaning logic, but couldn't get the varargs to work.
@spficklin I think this PR is ready for you to review. I know there's a lot in here beyond simply satisfying DSL 2 but there were a lot of downstream benefits, and I was fixing or reorganizing small things as I went. Let me know what you think when you have time. No rush.
@spficklin Will the retrieve_sra_metadata
process run if it doesn't have any inputs? I assume it will just run once...
Hey @bentsherman . This updates are great! I did have to fix a few things though.
First, If the user doesn't specify a file of skipped samples I get this error:
Error executing process > 'retrieve_sra_metadata (1)'
Caused by:
Path value cannot be null
I fixed it by removing the inputs on the retreive_sra_metadata
process. We don't need any input channels we can just use the variables from the configuration.
Second, the resume feature was broken. The workflow would not resume if it was preempted. I had to fix it by making the following changes.
process
folder doesn't seem to fire if the file was already. process
folder to trigger the workflow to move along.I made one other change, which wasn't a bug fix. I added some if statements around process calls that don't need to get used. For example, if the user didn't give any SRA files then there's no need to have the download process show up in the Nextflow output.
Finally, the naming of some of the channels confused me, so in the process of debugging I renamed some of them. It was just a few. Take a look and if you are okay with my updates I think we can merge this.
@spficklin Thanks for the edits. I thought I had the resume working, but maybe I didn't test resuming from a partial run. I made a few formatting edits.
The code you re-added for your point (1) was already in the beginning of the workflow block. Please resolve that per my comment.
Hey @bentsherman you are right about item #1. I didn't notice the reworked code that did that. I removed it. This all looks good! I'll merge. Thanks for the quick review.
Nextflow's DSL 2 syntax provides a much better way to write pipeline scripts. The DSL 2 allows you to place all of the channel logic into a single "workflow" block, instead of this logic being scattered throughout the script and within the process blocks. Additionally, DSL 2 allows you to use the same channel multiple times instead of making copies via
into
.GEMmaker has a lot of this glue logic, which makes it very hard to read, understand, and maintain. As such it should benefit tremendously from DSL 2. I intend to make this migration and then simplify the channel logic as much as I can.
I am submitting this PR now so that anyone who's interested can track my progress and provide input along the way. At this point, I have made the bare minimum changes required to satisfy DSL 2, and the workflow runs correctly... mostly. However, I think the channel logic can be simplified much further, so I will continue to work on it. I'll let you know when it's ready to review for merge.
My running checklist:
DSL 2 also introduces the ability to have separate module files. While I don't intend to incorporate this feature in this PR, GEMmaker will likely benefit from being reorganized into modules as well, so migrating to DSL 2 is the first step towards that dream.