gperu commented 1 year ago

Lesson Title

Introduction to Bioinformatics workflows with Nextflow and nf-core

Lesson Repository URL

https://github.com/carpentries-incubator/workflows-nextflow

Lesson Website URL

https://carpentries-incubator.github.io/workflows-nextflow/

Lesson Description

This lesson is a three day introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software enviroments like conda. It allows the adaptation of pipelines written in the most common scripting languages such as Bash, R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Author Usernames

@ggrimes @ameynert

Zenodo DOI

No response

Differences From Existing Lessons

No response

Confirmation of Lesson Requirements

[X] is the original work of the author(s), or that any content derived from another source is reused with permission and appropriate attribution
[X] aligns with The Carpentries Code of Conduct
[X] is published under a CC-BY or CC0 license
[X] uses The Carpentries lesson template or Carpentries Workbench without significant customisation/adaptation.

JOSE Submission Requirements

[ ] the lesson repository includes paper.md and paper.bib files as described in the JOSE submission guide for learning modules

Potential Reviewers

No response

tobyhodges commented 1 year ago

Thanks for submitting this lesson to The Carpentries Lab, @gperu, @ggrimes and @ameynert. I'm excited to see it enter review!

I'll be acting as Editor on the submission. I am just about to go on leave for a few weeks, but I will work through the Editor checklist when I get back in January. When I've completed those checks, and anything they bring up has been addressed, I can begin a search for reviewers.

For now, to ensure that the review process runs as smoothly as possible, please make sure you are subscribed to receive notifications from this thread. On the right sidebar of this page you should see a section headed Notifications, with a Customize link. You can click on that and make sure that you have the Subscribed option selected, to receive all notifications from the thread.

You can add a badge to display the status of the review in the README of your lesson repository with the following Markdown:

[![The Carpentries Lab Review Status](http://badges.carpentries-lab.org/16_status.svg)](https://github.com/carpentries-lab/reviews/issues/16)

ggrimes commented 1 year ago

@tobyhodges can you please pause this review as I need to update the main https://github.com/carpentries-incubator/workflows-nextflow repo with the latest code which has been developed in my personal github repository https://github.com/ggrimes/workflows-nextflow under the branch agnostric.

tobyhodges commented 1 year ago

Thanks for the heads-up, @ggrimes. Please let me know when you are ready for me to begin working through the editorial checklist.

ggrimes commented 10 months ago

I am ready to start the review process for this lesson. Please let me know the next steps

ggrimes commented 10 months ago

Potential Reviewers

Here is a list of potential reviewers, both experts in nextflow and nf-core. Also have contributed to training materials.

@christopher-hakkaart @mribeirodantas

tobyhodges commented 9 months ago

Editor Checklist - Introduction to Bioinformatics workflows with Nextflow and nf-core

Accessibility

[x] All figures are also described in image alternative text or elsewhere in the lesson body.

The first three figures in Getting Started with Nextflow are lacking alternative text descriptions. These kinds of diagrams can be difficult to capture properly in a text description, but please give it a go and feel free to ask for help here if you need it.

The purpose of alternative text is to communicate the purpose of the image to someone who cannot see it, so I recommend you focus on the key points/concepts these figures are intended to get across to learners.

[Edit: alt-text has now been added]

[x] The lesson uses appropriate heading levels:
- [x] h2 is used for sections within a page.
- [x] no “jumps” are present between heading levels e.g. h2->h4.
- [x] no page contains more than one h1 element i.e. none of the source files include first-level headings.
[x] The contrast ratio of text in all figures is at least 4.5:1.

Content

[x] The lesson teaches data and/or computational skills that could promote efficient, open, and reproducible research.
[x] All exercises have solutions.
[x] Opportunities for formative assessments are included and distributed throughout the lesson sufficiently to track learner progress. (We aim for at least one formative assessment every 10-15 minutes.)
[x] Any data sets used in the lesson are published under a permissive open license i.e. CC0 or equivalent.

The example data is licensed CC-BY, which is not really appropriate for data but is owned by the lesson author, so I think we can be confident there will not be any problems.

Design

[x] Learning objectives are defined for the lesson and every episode.
[x] The target audience of the lesson is identified specifically and in sufficient detail.

The Learner Profiles are sufficiently detailed. @ggrimes I recommend you add a link to that page from the introductory text in index.md.

Repository

The lesson repository includes:

[x] a CC-BY or CC0 license.
[x] a CODE_OF_CONDUCT.md file that links to The Carpentries Code of Conduct.
[x] a list of lesson maintainers.
[x] tabs to display Issues and Pull Requests for the project.

Structure

[x] Estimated times are included in every episode for teaching and completing exercises.
[x] Episodes lengths are appropriate for the management of cognitive load throughout the lesson.

Supporting information

The lesson includes:

[x] a list of required prior skills and/or knowledge.
[x] setup and installation instructions.
[x] a glossary of key terms or links out to definitions in an external glossary e.g. Glosario.

I recommend that you open issues on the lesson repositories for each point raised above, to help you maintain a view of what has been addressed as you go along. You do not need to provide a written response to all of the points raised, but please post back here when you think they have been addressed. I can then run through the checklist again and confirm that the lesson is ready to move to review. If you would like to provide additional explanation for any of the points raised, I encourage you to do so.

ggrimes commented 9 months ago

Added alt text to all images in first episode issue #106

tobyhodges commented 9 months ago

The alternative text descriptions you added are a big improvement, thanks.

I will start contacting reviewers for the lesson.

tobyhodges commented 9 months ago

@bobturneruk @hadrienG thank you both for volunteering to review a lesson for The Carpentries Lab. Please can you confirm that you are happy to review this Introduction to Bioinformatics workflows with Nextflow and nf-core lesson?

You can read more about the lesson review process in our Reviewer Guide.

HadrienG commented 9 months ago

confirming I'm happy to review this lesson!

bobturneruk commented 9 months ago

I am, too.

tobyhodges commented 9 months ago

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.

ggrimes commented 9 months ago

Thanks @bobturneruk and @HadrienG for agreeing to review .

ggrimes commented 9 months ago

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help. @tobyhodges should I let the other potential reviewers that i no longer need them?

tobyhodges commented 9 months ago

@tobyhodges should I let the other potential reviewers that i no longer need them?

Yes, I believe we are covered at this point. @HadrienG & @bobturneruk: of course, if circumstances change and you find that you no longer have capacity to review the lesson, do let me know and I can respond accordingly.

HadrienG commented 8 months ago

DRAFT REVIEW

Great job folks! I'm well in my way into the review process, and thought I'd post this even if I'm not done, so you can eventually start addressing the first comments. I'll change the header and this comment when I'm done.

Accessibility

[ ] The alternative text of all figures is accurate and sufficiently detailed.
- In episode 12, the firgure depicting the nf-core project is insufficient. I don't think each item of the figure should be described, but a few words about the three main categories would be welcome.
- In episode 12, the alt text for the nfcore_config.png image is insufficient. Since the content of the picture is explained below in the episode, I wonder if something like "a graphical description of different config profiles explained below" is not a better alt text. Feel free to ingore me if the current alt text is in line with the carpentries' policy :-)
[x] The lesson content does not make extensive use of colloquialisms, region- or culture-specific references, or idioms.
[x] The lesson content does not make extensive use of contractions (“can’t” instead of “cannot”, “we’ve” instead of “we have”, etc).

Content

The lesson content:
- [x] conforms to The Carpentries Code of Conduct.
- [x] meets the objectives defined by the authors.
- [x] is appropriate for the target audience identified for the lesson.
- [x] is accurate.
- [ ] is descriptive and easy to understand.
- [x] is appropriately structured to manage cognitive load.
- [x] does not use dismissive language.
[x] Tools used in the lesson are open source or, where tools used are closed source/proprietary, there is a good reason for this e.g. no open source alternatives are available or widely-used in the lesson domain.
[x] Any example data sets used in the lesson are accessible, well-described, available under a CC0 license, and representative of data typically encountered in the domain.
[x] The lesson does not make use of superfluous data sets, e.g. increasing cognitive load for learners by introducing a new data set instead of reusing another that is already present in the lesson.
[x] The example tasks and narrative of the lesson are appropriate and realistic.
[x] The solutions to all exercises are accurate and sufficiently explained.
[x] The lesson includes exercises in a variety of formats.
[x] Exercise tasks and formats are appropriate for the expected experience level of the target audience.
[ ] All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment.
[x] Exercises are designed with diagnostic power.

Episode 3 - Channels

First objective: "How do I get data into Nextflow?" is misleading, we've already done that in episode 2: parametrisation

Episode 4 - Processes

process_rscript.nf: including the ShortRead package in the conda environment would let the learners run the script.
I think the "Associated script" note about python scripts could be an exercise: "move the python code block into it's own script"
I don't think it is very useful to learn about "user-specified file names", this paragraph could be removed.

Episode 6 - Workflow

Workflow definition: "[...] Therefore it’s the entry point [...]" is the only mention of "entry point". Maybe the concept needs to be defined somewhere?
collect() should be introduced first in the channels episode, instead of appearing here for the first time.

Episode 7 - Operators

The into operator is deprecated.
Overall, very good chapter!

Episode 9 - Nextflow configuration

Process selectors: there is a mention of "fully qualified name" (NFCORE_RNASEQ:RNA_SEQ:SAMTOOLS_SORT) but scoping is not mentioned or explained previously.

Design

[x] Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
[x] The target audience identified for the lesson is specific and realistic.

Supporting information

[x] The list of required prior skills and/or knowledge is complete and accurate.
[ ] The setup and installation instructions are complete, accurate, and easy to follow.
[ ] No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. Glosario.

setup

The conda setup and the standalone setup do not install the same version of nextflow, which causes issues down the line. 20.10 is installed standalone, and is assumed to be used throughout the whole lesson, However, conda installs >=20.10. I would suggest to migrate the setup and all episodes to >=22.03. This removes the need for nextflow.enable.dsl=2 in the scripts, and only would require a few small changes in some episodes. DSL1 is old, deprecated and in my opinion unnecessary to bring up / explain.
It is not crystal clear that the "Nextflow install without conda" part is optional.
nf-core is missing in the environment.yml

General

Minor issues and bugs

The points tagged DSL2 are dependent on the above setup issue.

setup

[ ] Traning software:conda, broken link to environment.yml
[ ] conda env not working natively on M1/M2 macs unless roseta is installed. I guess this will fix itself at some point? or should there be a note for mac users in the setup?
[ ] data download: remove the $ sot the command can be copy/pasted more easily

getting started

[ ] Nextflow core features: broken image
[ ] Your first script: "A multi-line Nextflow comment, written using C style block comments, followed by a single line comment." There is no single line comment.
[ ] Your first script: 5. and 6. should be swapped
[ ] nextflow run word_count.nf gives 0 because the input file does not exist. Should be data/yeast/reads/ref1_1.fq.gz

workflow parametrisation

[ ] cp wc.nf wc-params.nf change wc.nf to word_count.nf
[ ] be consistent when naming files (word_count vs wf-params)

Channels

[ ] Downloading data from SRA does not require an API key for small downloads. I would remove this part it since it's so little data.

Processes

[ ] This part does not have sub-section in the left navigation menu.
[ ] DSL2: the processed should be in workflow blocks
[ ] Script: remove the ~~~ left in the script
[ ] process_multi_line.nf does not print what is indicated
[ ] Why mix the use of echo and printf in bash blocks? Pick one and be consistent
[ ] The Input Repeaters challenge is wrongly formatted

Processes part 2

[ ] tree is not available on macOS by default. Personally, I'd avoid using it

Workflows

[ ] workflow_01.nf is fastqc on the website, salmon in the scripts.

Operators

[ ] parse a csv file challenge: ~~~{: .language-groovy } present in the solution

nf-core

[ ] nf-core lunch rnaseq: outdir is also a required parameter.
[ ] DSL2: hlatyping pipeline is too old

ggrimes commented 8 months ago

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

HadrienG commented 8 months ago

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

Disclaimer/conflict of interest: I have previously published an nf-core pipeline and I'm a member of the nf-core community.

I think the nf-core episode will be helpful for a lot of people. Although the episode is not teaching nextflow per se, nf-core is a valuable resource, and researchers would benefit being aware that it exists, and learning how to launch pipelines.

Secondly, while there are other collections of nextflow pipelines out there, the community aspect of nf-core is quite unique, and I don't think including them is especially unfair to another community. What I would eventually suggest is keeping the nf-core episode but removing it from the lesson title and go with "Introduction to Bioinformatics workflows with Nextflow". You could even rename the nf-core episode "Launching publicly available pipelines" and start the episode with a non nf-core example: nextflow run carpentries/nextflow-example that would download and run a minimal nextflow piepline that you created. That way you illustrate to the leaners that they can run any nextflow pipeline they find on github, and then show nf-core as a community example that has a lot of pipelines available.

bobturneruk commented 7 months ago

Hi! I just wanted to say I am working on my review. I've got to ep 7. It's taking a while as I think it deserves to be done in some detail, as overall it seems to be such a well thought out and useful resource. Lots of specific comments to follow once I've had time to look at it all. I'm trying not to look at @HadrienG's report for now.

bobturneruk commented 7 months ago

I'm going to post my review below in a moment. I've probably made some mistakes - interpreted things wrongly or misunderstood something technical - and I'm mindful that in some places I'm asking for extra work to be done which means even more of someone's time and effort. Overall I hope it's helpful. I'm happy to discuss further via a call, Slack or on here.

Once again, my overall feeling is that this lesson will be a great help for people getting into Nextflow.

bobturneruk commented 7 months ago

Reviewer Checklist

Accessibility

[ ] The alternative text of all figures is accurate and sufficiently detailed.
- Large and/or complex figures may not be described completely in the alt text of the image and instead be described elsewhere in the main body of the episode.
[x] The lesson content does not make extensive use of colloquialisms, region- or culture-specific references, or idioms.
[x] The lesson content does not make extensive use of contractions (“can’t” instead of “cannot”, “we’ve” instead of “we have”, etc).
Figures in episode 1 appear blurry and have very small text (Windows 11, Chrome, 100% zoom on browser).
Figures lacking detailed alternative text:

Content

The lesson content:
- [x] conforms to The Carpentries Code of Conduct.
- [x] meets the objectives defined by the authors.
- [x] is appropriate for the target audience identified for the lesson.
- [x] is accurate.
- [x] is descriptive and easy to understand.
- [x] is appropriately structured to manage cognitive load.
- [x] does not use dismissive language.
[x] Tools used in the lesson are open source or, where tools used are closed source/proprietary, there is a good reason for this e.g. no open source alternatives are available or widely-used in the lesson domain.
[ ] Any example data sets used in the lesson are accessible, well-described, available under a CC0 license, and representative of data typically encountered in the domain.
[x] The lesson does not make use of superfluous data sets, e.g. increasing cognitive load for learners by introducing a new data set instead of reusing another that is already present in the lesson.
[x] The example tasks and narrative of the lesson are appropriate and realistic.
[x] The solutions to all exercises are accurate and sufficiently explained.
[x] The lesson includes exercises in a variety of formats.
[x] Exercise tasks and formats are appropriate for the expected experience level of the target audience.
[x] All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment.
[ ] Exercises are designed with diagnostic power.
Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.
I don't have the expertise to comment on whether exercises are designed with diagnostic power.
I've made a PR with several minor comments which the maintainers might accept or reject individually.

Specific comments follow...

ep1

I get this output from the script, which is different to the course material. Might be good if the output number wasn't 0?

(nf-training) bobturner@tubby:~/nf-training$ nextflow run word_count.nf
N E X T F L O W  ~  version 20.10.0
Launching `word_count.nf` [boring_nobel] - revision: 72656509cb
executor >  local (1)
[14/523859] process > NUM_LINES (1) [100%] 1 of 1 ✔
SRR2584863_1.fastq.gz   0

Perhaps this helps? https://github.com/bobturneruk/workflows-nextflow/commit/6be012f49314006b35cf162546bf8d1ec9086d7d
I think there are some duplicate items in the list of script contents. I suggest https://github.com/bobturneruk/workflows-nextflow/commit/087842116f0cc4767d2811d5c51f0425cd9a9cee

ep2

The first exercise in ep2 has a lot of content shared directly with the preceding bit of live coding - we've already talked participants through how to add a sleep parameter, then this is repeated as an exercise. An alternative exercise might be to ask for sleep_before and sleep_after parameters?
Might it be easier on learners not to mention YAML and just stick with JSON?

ep3

Can the DSL1 reference be removed? I assume the course is DSL2 (the current DSL) specific and that people doing it are new to Nextflow, so DSL1 might not be relevant to them.
The "reminder" in this section is the first time that lists and maps are explained. file://wsl.localhost/Ubuntu/home/bobturner/workflows-nextflow/site/docs/03-channels.html#the-value-channel-factory Will learners be familiar with those concepts already from the Python or R prerequisites?
I didn't have any chicken data to run the code at the end of this section https://carpentries-incubator.github.io/workflows-nextflow/03-channels.html#the-frompath-channel-factory
I'm not clear on how a tuple is defined here. It's introduced as "a grouping of data, represented as a Groovy List" - could it be called a list? The Nextflow docs really only talk about tuples in the fromFilePairs section "The matching files are emitted as tuples". Maybe there is an important distinction, but I'm a bit confused. I see Tuples are defined here https://www.nextflow.io/docs/latest/process.html#input-type-tuple and in the lesson here https://carpentries-incubator.github.io/workflows-nextflow/05-processes-part2.html#grouped-inputs-and-outputs
I think if users are expected to code-along with the section on fromSRA they will need instructions on how to get an NCBI API key, and time to do this.

ep4

The flag -process.echo is explained before it is used here https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#process-definition - I don't think it's in earlier material?
I don't think users will be able to run the R code in this section without additional setup steps https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#script
Could PYSTUFF be called something more specific, please? Same for RSTUFF. It's better for readers if short, descriptive names are used.
Could myscript.py be called something more specific, please?
I suggest that it's always better to include Python or R scripts in separate files to facilitate automated testing.
I suspect this would be a difficult thing to do, but perhaps the Python and R material should be removed in the interests of time?
Is it necessary to introduce this alternative shell: way of using Bash in Nextflow, please? Perhaps better to stick with script:? https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#shell
Do inputs need to be wrapped in ${...} when used in a script? What is good practise?

ep5

The work directory is mentioned here for the first time, but not explained until ep10. It might be good to offer a very brief explanation of it at this point?
Is it worth making the difference between CPUs, cores and threads clear here? https://carpentries-incubator.github.io/workflows-nextflow/05-processes-part2.html#directives
Will learners be familiar with symbolic links before starting the lesson?

ep6

I get an error when running the first code block:


executor >  local (10)
[e5/e6fded] process > FASTQC (5) [100%] 9 of 9 ✔
[17/30b05f] process > MULTIQC    [100%] 1 of 1, failed: 1 ✘

Error executing process > 'MULTIQC'

Caused by: Process MULTIQC terminated with an error exit status (1)

Command executed:

multiqc .

Command exit status: 1

Command output: Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 42/42

Command error: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'seqyclean' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module seqyclean raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'optitype' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module optitype raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[WARNING] multiqc : No analysis results found. Cleaning up.. [INFO ] multiqc : MultiQC complete

Work dir: /home/bobturner/nf-training/work/17/30b05f5243e6fd12c0013cc12ea0c8

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

- This persists with subsequent examples

#### ep7
- Is it safe to assume that learners will be familiar with regular expressions? If not https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#regular-expression may be confusing.
- There's a lot of really good content in this episode, but will all of it be relevant to new Nextflow users? https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#grouping-contents-of-a-channel-by-a-key- might be a bit complicated? Or perhaps this is something that users encounter quite early in their Nextflow journey?

#### ep8

- Could this detail be referred to rather than included? Perhaps it is enough to say it's a unique ID? https://carpentries-incubator.github.io/workflows-nextflow/08-reporting.html#task-id
- The final exercise in this episode is the same as the preceding content. Maybe best to omit the exercise, or give the markdown example and ask learners to modify the command for given HTML as an exercise?

#### ep9

When I first tried the Conda exercise I got this error:

Error executing process > 'FASTP (1)'

Caused by: Failed to create Conda environment command: conda create --mkdir --yes --quiet --prefix /home/bobturner/nf-training/work/conda/env-a7a3a0d820eb46bc41ebf4f72d955e5f bioconda::fastp=0.12.4-0 status : 1 message: CondaValueError: You have chosen a non-default solver backend (libmamba) but it was not recognized. Choose one of: classic


I guess down to my Conda config. I could fix by:

conda config --set solver classic

- I suggest `docker.runOptions = '-u $(id -u):$(id -g)'` should be explained.

#### ep10

- I don't think learners are asked to create a file called `wc.nf` before they are asked to resume it in the first exercise here. It's maybe called `word_count.nf` earlier in the lesson?
- I'm not completely clear on what "The command wrapped used to run the job." means. As far as I can tell by searching, the definition of `.command.run` is not in the Nextflow docs. It might be better described as "The full Nextflow bash script used to run `.command.sh`."?

#### ep11

- Is it the intention that this episode re-explains lots of content from previous episodes? For example parameters and processes are covered as if new content. Perhaps there is scope to remove some of this e.g.

> A process is defined by providing three main declarations:
>
> 1. The process inputs,
> 2. The process outputs
> 3. Finally the command script.

- I suggest the "Recap" sections are reviewed - they may be useful but they introduce a different structure to ep11 compared with earlier episodes. They also often imply that content previously covered is first covered in ep11.
- "The FASTQC process will not run as the process has not been declared in the workflow scope." - is this the same definition of "scope" as used earlier? Is there a workflow scope as well as a params scope and an aws scope?
- I get this error when running script6:

Error executing process > 'MULTIQC'

Caused by: Process MULTIQC terminated with an error exit status (1)

Command executed:

multiqc .

Command exit status: 1

Command output: Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 141/141

Command error: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'seqyclean' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module seqyclean raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'optitype' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module optitype raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[WARNING] multiqc : No analysis results found. Cleaning up.. [INFO ] multiqc : MultiQC complete

Work dir: /home/bobturner/nf-training/work/13/2a6ef5a3e567c3f56e41d67eefe205

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

- Ternary operators should be explained more fully if the term is to be used.
- Graphviz installation is not part of the setup instructions, but is needed for the last exercise.
- I suggest the key points are reviewed to better match the Questions and Objectives of the lesson. They currently contain points covered earlier e.g. "A Workflow can be parameterise using params .".
- I don't think this episode introduces very much new content. An option would be to separate out the new content (e.g. "Metrics and Reports") and set up the episode as an extended exercise for learners - end the content prior to https://carpentries-incubator.github.io/workflows-nextflow/11-Simple_Rna-Seq_pipeline.html#define-the-pipeline-parameters and set an exercise to produce the code in `script6.nf`? Or maybe the extensive recap is intentional.

#### ep12

Some of this might be really specific to my setup...

- I get an error when pulling `rnaseq`:

nextflow pull nf-core/rnaseq -revision 3.0 Checking nf-core/rnaseq ... Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.id() is applicable for argument types: (String) values: [nf-validation@1.1.3] Possible solutions: is(java.lang.Object), is(java.lang.Object), find(), find(), find(groovy.lang.Closure), find(groovy.lang.Closure)

- Seems this issue may be related https://github.com/nextflow-io/nextflow/issues/888 - I tried making a clean directory to work in. Perhaps it's something to do with installing Nextflow via Conda? It works with my system Nextflow.
- Should learners be using the web based `launch` tool? I guess not.
- It might be useful to explain the relationship between a "profile" and a "scope".
- I get this error in the final exercise:

nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda --max_memory 3G -c nfcore-custom.config N E X T F L O W ~ version 23.04.2 Pulling nf-core/hlatyping ... downloaded from https://github.com/nf-core/hlatyping.git Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier


- `nextflow run nf-core/hlatyping -r 2.0.0 -profile test,conda  --max_memory '3.GB' --outdir out -c nfcore-custom.config` may be better, but needs Conda and I can't run the nf-core commands in Conda.
- gitter chat is retired. Google Groups might not still be useful. I think these should be replaced with a link to the Nextflow Slack.

### Design

- [x] Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
- [ ] The target audience identified for the lesson is specific and realistic.

- I'm not sure where the target audience is defined. I think if it's something like "Biological and Biomedical Scientists, Research Software Engineers and associated professionals" it would be great.

### Supporting information

- [ ] The list of required prior skills and/or knowledge is complete and accurate.
- [ ] The setup and installation instructions are complete, accurate, and easy to follow.
- [ ] No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. [Glosario](https://carpentries.github.io/glosario/).

- I think it would be helpful to make it clearer at the beginning of the instructions that this won't work on Windows, or if the participant has Windows, they'll need to use WSL2. Maybe linking to this would be helpful https://learn.microsoft.com/en-us/windows/wsl/install or perhaps that's too involved.
- The version of NextFlow specified is two major versions behind the current version (23, on 2024-03-22). Could a more recent version be used?
- Python 3.8, listed as a requirement, will be unsupported after quarter 3 of 2024 (https://devguide.python.org/versions/). Perhaps a more recent version could be used?
- The supplied `environment.yml` file (https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml) does not specify a version of Python. If Python version is important, then it should.
- For installing Conda, perhaps it would be best to link to the official instructions https://docs.anaconda.com/free/miniconda/miniconda-install/ ?
- I suggest re-ordering the instructions and making it clear how the `environment.yml` file should be obtained like this https://github.com/bobturneruk/workflows-nextflow/commit/8b2a7af3f58ce743f87b1d83d195ced022847c14
- I suggest if VSCode it what's best to use for the lesson, it is made a requirement, not a recommendation. This will mean greater consistency in learner experience.
- I suggest removing the instructions to install Nextflow without Conda. Again, I think consistency of setup will lead to less variation in issues experienced by learners.
- Installing with Conda went very smoothly for me!

- The Glossary only has 3 terms. I think it would benefit by being expanded.

### General

- Reviewed with Ubuntu 22.04 on WSL2.
- I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?
- I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

ggrimes commented 7 months ago

Thank you @HadrienG and @bobturneruk for your reviews. Do you have any suggestions about how I should reply to individual review comments?

Thanks,

bobturneruk commented 7 months ago

Hi @ggrimes! @tobyhodges - how is this normally handled, please? I don't think GitHub makes this easy. Maybe we could break things down into issues against https://github.com/carpentries-incubator/workflows-nextflow/issues ?

tobyhodges commented 7 months ago

Thanks @bobturneruk and @HadrienG for your detailed reviews. @ggrimes you can choose how to respond to individual comments, based on your personal preference. In the past, other lesson developers have found it helpful to open issues to track the improvements suggested in reviews. If you do that, please link to the issues in this thread when you are ready to respond to the reviews: I would like to make it easy for visitors to the thread to find everything related to the review.

Responding to a few points from @bobturneruk:

Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.

CC-BY is not an appropriate license for data and if you are able to adjust the license on the FigShare record to use CC0 I recommend to do so (bonus points if you add a CFF file as well), but the mistake is common enough and harmless enough that I am generally happy to let it pass if needed. I think the technically correct thing to do with data included in a lesson repository would be to mention in LICENSE.md that the data is available CC0. But honestly we have data files in lesson repositories elsewhere that we are not doing this for, and I would certainly be willing to let this go for the purposes of the review.

I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?

&

I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

As @bobturneruk predicted, GROOVY labels are added to the code blocks because syntax highlighting is being applied by specifying that blocks are groovy in the lesson source. Unfortunately, until a separate specification for nextflow is added to https://github.com/jgm/skylighting this is the best we can get. I agree that a note explaining this early in the lesson would be helpful to anyone following along on their own.

carpentries-lab / reviews

[Review]: Workflows with Nextflow #16

Lesson Title

Lesson Repository URL

Lesson Website URL

Lesson Description

Author Usernames

Zenodo DOI

Differences From Existing Lessons

Confirmation of Lesson Requirements

JOSE Submission Requirements

Potential Reviewers

Editor Checklist - Introduction to Bioinformatics workflows with Nextflow and nf-core

Accessibility

Content

Design

Repository

Structure

Supporting information

DRAFT REVIEW

Accessibility

Content

Episode 3 - Channels

Episode 4 - Processes

Episode 6 - Workflow

Episode 7 - Operators

Episode 9 - Nextflow configuration

Design

Supporting information

setup

General

Minor issues and bugs

setup

getting started

workflow parametrisation

Channels

Processes

Processes part 2

Workflows

Operators

nf-core

Reviewer Checklist

Accessibility

Content

ep1

ep2

ep3

ep4

ep5

ep6

Command error: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'seqyclean' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

[ERROR ] multiqc : Oops! The 'optitype' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Command error: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'seqyclean' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

[ERROR ] multiqc : Oops! The 'optitype' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None