SystemsGenetics / GEMmaker

A workflow for construction of Gene Expression count Matrices (GEMs). Useful for Differential Gene Expression (DGE) analysis and Gene Co-Expression Network (GCN) construction
https://gemmaker.readthedocs.io/en/latest/
MIT License
33 stars 16 forks source link

nf-core DSL2 compatibilty #249

Closed spficklin closed 2 years ago

spficklin commented 2 years ago

This PR updates GEMmaker to compatibility with nf-core DSL2 framework. The major changes are

  1. There is a new modules/local directory where all of the processes have been moved into a separate files. This isn't fully nf-core module worthy but our modules don't quite follow nf-core guidelines for sharability so this should be fine for now.
  2. The main.nf is just a short script that calls the GEMmaker workflow.
  3. The GEMmaker workflow is now found here workflows/GEMmaker.nf
  4. The intialization steps for the GEMmaker workflow are found here lib/WorkflowGEMmaker.groovy
  5. There are a lot of new files that the nf-core DSL 2.0 modules have and I've added those in and removed file that are no longer need.

Some things that are still off that we'll want to fix

  1. For nf-core DSL2 each process is expected to have it's own Docker image. Interestingly, this is the way we did it with GEMmaker initially and then switched to the single Docker image when we moved to nf-core. For now, though, I've set each process to use a GEMmaker-2.0.0 docker image. The environment.yml file is no longer needed since we are expected to use public existing Docker images for tools. This might not quite work for GEMmaker because all of our tools have some "#TRACE" output in the stdout so we can't just use the public Docker images for reach tool.
  2. The retrieve_sra_metadata never seems to get cached or for some reason Nextflow thinks something has changed and re-runs it. This will always cause SRAs to be redownloaded every time. This needs fixing before we make a release. I can work on this bug once this PR gets merged. I didn't want to make the PR much bigger.
  3. If all of the samples complete but GEMmaker fails during the create_gem step it will never create the GEM when it resumes. We need to fix this, but that's for a different PR.
bentsherman commented 2 years ago

@spficklin For (1) I would use standard Docker images for processes if they already exist, and use the gemmaker image otherwise. The trace directives are all very simple, they should work either way. I would caution against over-optimizing for nf-core if it makes GEMmaker harder to improve and maintain.

As for (2) and (3), I think you should go ahead and investigate these issues in this PR. These changes are not urgent, so we can sit with it for a little while to make sure everything is stable. If the fixes for these problems turn out to be simple, then including them here is fine, but even if they are not, it might be worthwhile to wrestle with those changes while we're already re-organizing everything. I just don't see any reason to introduce two new bugs when this PR is mostly just housekeeping. Unless there are other big changes you want to make downstream that are enabled by this refactoring.

github-actions[bot] commented 2 years ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 718836f

+| ✅  76 tests passed       |+
#| ❔  19 tests were ignored |#
!| ❗   5 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found: `lib/WorkflowSystemsgenetics/gemmaker.groovy` * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Naming does not adhere to nf-core conventions: Contains non alphanumeric characters * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema param `fasta` not found from nextflow config * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema param `max_multiqc_email_size` not found from nextflow config * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema param `multiqc_config` not found from nextflow config ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `assets/nf-core-systemsgenetics/gemmaker_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-systemsgenetics/gemmaker_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `conf/igenomes.config` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - nextflow_config * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/CONTRIBUTING.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/ISSUE_TEMPLATE/config.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/PULL_REQUEST_TEMPLATE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/branch.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.html` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.txt` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `assets/nf-core-systemsgenetics/gemmaker_logo.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `bin/scrape_software_versions.py` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-systemsgenetics/gemmaker_logo.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `docs/README.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/multiqc_config.yaml` * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - actions_ci * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - readme ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `bin/scrape_software_versions.py` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreSchema.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules/local/get_software_versions.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.markdownlint.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `LICENSE` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreSchema.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreTemplate.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitignore` matches the template * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools-docs/lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools-docs/lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - Did not find any Jinja template strings (165 files) * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema title + description lint passed * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: awstest.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: awsfulltest.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` ### Run details * nf-core/tools version 2.1 * Run at `2021-11-21 04:34:23`
spficklin commented 2 years ago

Okay, I fixed both bugs from (2) and (3). The problem was that the processes that looked at the work/GEMmaker folder were using a file path to that directory. Nextflow is smart in that when it sees anything that looks like a path in a variable in the script section, it will check its stats and if it sees the file/directory has changed then it will re-run that step. Because the work/GEMmaker folder is always changing it meant those processes that look there don't ever use their cache but always get re-run. I had to trick Nextflow by passing in variables that didn't quite look like a path and then refactoring them. You'll see what I did in the code.

Also, another change I made was moving all sample files from both the work/GEMmaker/process and work/GEMmaker/done folders back into the work/GEMmaker/stage folder on a resume. This doesn't cause any sample to get re-executed because they are already cached, but it does allow the samples to flow through the channels and Nextflow will print out all of the cached processes which fixes problem (3) and makes it nicer for the end-user because it's clear what's been done and what hasn't been done.

Here's an example of the Hisat2 pipeline with caching of the retrieve and download working!

image

I also changed the container for all of the processes with tools that had their own compatible Docker to use the public Docker image instead of the gemmaker-2.0.0 docker image. There are still several processes that need the gemmaker image but it's a bit lighter now as it doesn't need so many tools in it.

I have tested all three alignment tools and they are all working.

There is still one issue, but it's not major. The get_software_versions process never runs. I think it's because the processes that use actual tools need to be more formally defined following the nf-core module standards. I think since it's working we can add a new issue for that and call this good....

bentsherman commented 2 years ago

@spficklin I reviewed your updates and everything looks good to me. I'm glad you were able to fix the issues (2) and (3). The "path:" thing is a clever hack, I'm sure we'll be able to find a cleaner way to do it in the future.

I see what you meant now about the trace directives. We can't use the standard nf-core/modules because our versions have the trace directives in them. That is unfortunate. But I see the trace directives in a similar light as GEMmaker's cleanup code -- hopefully both of these features will eventually be incorporated into Nextflow itself.

Ready to merge when you are.

spficklin commented 2 years ago

Okay! I just ran a test of the 475 rice dataset on Kamiak and it's having a bit of issues on the fastq_merge step. Moving the bash scripts back into the process has made it so that if any of the commands return a non-zero exit status nextflow stops. If there is non-paired data then one of the ls commands returns a 1 exit status. I thought the fix I did worked to get around this. It does on my Ubuntu machine but not on Kamiak. So, let me try and fix that and then I'll merge. Thanks for the quick reviews.

edit: it seems to be failing on the CI tests too....

spficklin commented 2 years ago

This branch has processed over 4K samples on the 26K Arabidopsis dataset on Kamiak without issues. So, I'm merging.