genepi / nf-gwas

A nextflow pipeline to perform state-of-the-art genome-wide association studies.
https://genepi.github.io/nf-gwas
MIT License
63 stars 21 forks source link

Bundle the jbang-compiled JARs within the docker container and update empty-file usage for cloud storage #46

Closed abhi18av closed 2 years ago

abhi18av commented 2 years ago

This PR explores the colocation of jbang-compiled jars into the docker container itself, as mentioned in the second suggestion here https://github.com/genepi/nf-gwas/issues/45

I'd be happy to give this PR finishing touches, if you agree with the overall direction here.

NOTE: In this POC, I've hardcoded the path of the JAR in the process, to avoid having to rebuild/push the docker container again which needs to be tweaked accordingly.

As of https://github.com/abhi18av/nf-gwas/pull/2/commits/97d075d63fe41569016560713f179a0487aad4ff , this approach is working.

N E X T F L O W  ~  version 22.04.5
Launching `main.nf` [backstabbing_liskov] DSL2 - revision: 6e13e51147
executor >  local (1)
[29/57e2f7] process > NF_GWAS:VALIDATE_PHENOTYPES [100%] 1 of 1 ✔
Pipeline completed at: 2022-08-19T10:35:17.167361542+02:00
Execution status: OK
N E X T F L O W  ~  version 22.08.2-edge
Launching `./main.nf` [cranky_noyce] DSL2 - revision: 6e13e51147
Uploading local `bin` scripts folder to az://batch-jobs/nf-gwas-workdir/tmp/2c/8b5df3e5d7a4229ba4e2a2d27a4beb/bin
executor >  azurebatch (1)
[94/c78916] process > NF_GWAS:VALIDATE_PHENOTYPES [100%] 1 of 1 ✔
Pipeline completed at: 2022-08-19T10:44:09.673403+02:00
Execution status: OK
Completed at: 19-Aug-2022 10:44:10
Duration    : 3m 20s
CPU hours   : (a few seconds)
Succeeded   : 1
abhi18av commented 2 years ago

Quick update: As of e580e1181ad62e2368c71d43b2643ecdce2a42aa, the pipeline is working locally and on cloud.

~/data/projects/nf-gwas$ nextflow -c ../nf-gwas-local.config run main.nf -profile test,docker --outdir custom_results
N E X T F L O W  ~  version 22.04.5
Launching `main.nf` [tender_brahmagupta] DSL2 - revision: 6e13e51147
executor >  local (16)
[a4/355962] process > NF_GWAS:VALIDATE_PHENOTYPES          [100%] 1 of 1 ✔
[6f/abd8ce] process > NF_GWAS:QC_FILTER_GENOTYPED (1)      [100%] 1 of 1 ✔
[57/191e01] process > NF_GWAS:REGENIE_STEP1 (1)            [100%] 1 of 1 ✔
[98/65f8a9] process > NF_GWAS:REGENIE_LOG_PARSER_STEP1 (1) [100%] 1 of 1 ✔
[2f/4c2139] process > NF_GWAS:REGENIE_STEP2 (example)      [100%] 1 of 1 ✔
[92/20dab3] process > NF_GWAS:REGENIE_LOG_PARSER_STEP2     [100%] 1 of 1 ✔
[f6/c5f4be] process > NF_GWAS:FILTER_RESULTS (example_Y2)  [100%] 2 of 2 ✔
[1a/c58b92] process > NF_GWAS:MERGE_RESULTS_FILTERED (Y1)  [100%] 2 of 2 ✔
[e7/ac8a20] process > NF_GWAS:MERGE_RESULTS (Y1)           [100%] 2 of 2 ✔
[86/a00dd6] process > NF_GWAS:ANNOTATE_FILTERED (1)        [100%] 2 of 2 ✔
[56/ad0931] process > NF_GWAS:REPORT (1)                   [100%] 2 of 2 ✔
Pipeline completed at: 2022-08-21T22:06:02.551433621+02:00
Execution status: OK
Completed at: 21-Aug-2022 22:06:02
Duration    : 1m 25s
CPU hours   : (a few seconds)
Succeeded   : 16
$ nextflow -c ../nf-gwas-azure.config run ./main.nf -profile test,docker,azb 
N E X T F L O W  ~  version 22.08.2-edge
Launching `./main.nf` [hopeful_plateau] DSL2 - revision: 6e13e51147
Uploading local `bin` scripts folder to az://batch-jobs/nf-gwas-workdir/tmp/0a/2a71a0f98537a0cf40df73be97319f/bin
executor >  azurebatch (16)
[56/b01abe] process > NF_GWAS:VALIDATE_PHENOTYPES          [100%] 1 of 1 ✔
[69/ae0aed] process > NF_GWAS:QC_FILTER_GENOTYPED (1)      [100%] 1 of 1 ✔
[06/f22a2a] process > NF_GWAS:REGENIE_STEP1 (1)            [100%] 1 of 1 ✔
[b7/e28cf7] process > NF_GWAS:REGENIE_LOG_PARSER_STEP1 (1) [100%] 1 of 1 ✔
[21/83d459] process > NF_GWAS:REGENIE_STEP2 (example)      [100%] 1 of 1 ✔
[d5/9e9522] process > NF_GWAS:REGENIE_LOG_PARSER_STEP2     [100%] 1 of 1 ✔
[f3/875964] process > NF_GWAS:FILTER_RESULTS (example_Y2)  [100%] 2 of 2 ✔
[75/898d5f] process > NF_GWAS:MERGE_RESULTS_FILTERED (Y2)  [100%] 2 of 2 ✔
[22/461918] process > NF_GWAS:MERGE_RESULTS (Y1)           [100%] 2 of 2 ✔
[fc/d09a56] process > NF_GWAS:ANNOTATE_FILTERED (2)        [100%] 2 of 2 ✔
[7b/ab6d0c] process > NF_GWAS:REPORT (2)                   [100%] 2 of 2 ✔
Pipeline completed at: 2022-08-21T22:22:16.397925+02:00
Execution status: OK
Completed at: 21-Aug-2022 22:22:16
Duration    : 5m 46s
CPU hours   : 0.2
Succeeded   : 16

I'm finalizing the nf-test related updates - but in the meantime, it'd be great if you could please test the pipeline on your end (with some real data) using this container rg.fr-par.scw.cloud/nfcontainers/nf-gwas-with-jars:1.0.0 🙏

seppinho commented 2 years ago

Hi guys, really appreciate the commits and improvements. We already run the branch on real data (2 chromosomes) and everything looks good so far. We'll start a larger analysis on all chromosomes now but we dont expect any major problems with that.

seppinho commented 2 years ago

Hi guys, Analysis finished as expected. image

If everything is done from your side, I'm happy to merge this. Once again, thanks for your time and the improvements. Curious to see how it performs on large datasets on Azure.

drpatelh commented 2 years ago

Awesome! Thanks for writing the pipeline and for your work on nf-test :bowtie: Great to be able to use a well-written pipeline off-the-shelf rather than re-inventing the wheel.

Will wait for @abhi18av to confirm that he is happy to merge.

Once merged it would be awesome if you are able to push a new container and create a release for us 🙏🏽

abhi18av commented 2 years ago

As a final test for cloud execution, I've successfully 💚 tested the execution again - looks good from my side. Please go ahead with the merge and release 🙏

nf-gwas  🍣 jbang-docker 🅒 base 
+  >_ nextflow -c ../nf-gwas-azure.config run ./main.nf -profile test,docker,azb
N E X T F L O W  ~  version 22.08.2-edge
Launching `./main.nf` [irreverent_volta] DSL2 - revision: 6e13e51147
Uploading local `bin` scripts folder to az://batch-jobs/nf-gwas-workdir/tmp/16/88dad2486dfa0eaa530512b44ce5c6/bin
executor >  azurebatch (16)
[e9/9893a9] process > NF_GWAS:VALIDATE_PHENOTYPES          [100%] 1 of 1 ✔
[fb/c542ea] process > NF_GWAS:QC_FILTER_GENOTYPED (1)      [100%] 1 of 1 ✔
[48/bdacc8] process > NF_GWAS:REGENIE_STEP1 (1)            [100%] 1 of 1 ✔
[ce/ce57df] process > NF_GWAS:REGENIE_LOG_PARSER_STEP1 (1) [100%] 1 of 1 ✔
[2e/b80888] process > NF_GWAS:REGENIE_STEP2 (example)      [100%] 1 of 1 ✔
[60/d2fc7c] process > NF_GWAS:REGENIE_LOG_PARSER_STEP2     [100%] 1 of 1 ✔
[49/3fc2b5] process > NF_GWAS:FILTER_RESULTS (example_Y1)  [100%] 2 of 2 ✔
[69/37a7b6] process > NF_GWAS:MERGE_RESULTS_FILTERED (Y1)  [100%] 2 of 2 ✔
[3b/2aa82b] process > NF_GWAS:MERGE_RESULTS (Y1)           [100%] 2 of 2 ✔
[4e/8de33c] process > NF_GWAS:ANNOTATE_FILTERED (2)        [100%] 2 of 2 ✔
[55/7999c0] process > NF_GWAS:REPORT (2)                   [100%] 2 of 2 ✔
Pipeline completed at: 2022-08-23T11:01:25.425121+02:00
Execution status: OK
Completed at: 23-Aug-2022 11:01:25
Duration    : 5m 34s
CPU hours   : 0.1
Succeeded   : 16
seppinho commented 2 years ago

I fixed the test cases which reflects the changes you made in the pipeline. Thanks again!!