carpentries-lab / reviews

Open peer review of lessons from The Carpentries community.
Other
14 stars 4 forks source link

[Review]: Workflows with Nextflow #16

Open gperu opened 1 year ago

gperu commented 1 year ago

Lesson Title

Introduction to Bioinformatics workflows with Nextflow and nf-core

Lesson Repository URL

https://github.com/carpentries-incubator/workflows-nextflow

Lesson Website URL

https://carpentries-incubator.github.io/workflows-nextflow/

Lesson Description

This lesson is a three day introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software enviroments like conda. It allows the adaptation of pipelines written in the most common scripting languages such as Bash, R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Author Usernames

@ggrimes @ameynert

Zenodo DOI

No response

Differences From Existing Lessons

No response

Confirmation of Lesson Requirements

JOSE Submission Requirements

Potential Reviewers

No response

tobyhodges commented 1 year ago

Thanks for submitting this lesson to The Carpentries Lab, @gperu, @ggrimes and @ameynert. I'm excited to see it enter review!

I'll be acting as Editor on the submission. I am just about to go on leave for a few weeks, but I will work through the Editor checklist when I get back in January. When I've completed those checks, and anything they bring up has been addressed, I can begin a search for reviewers.

For now, to ensure that the review process runs as smoothly as possible, please make sure you are subscribed to receive notifications from this thread. On the right sidebar of this page you should see a section headed Notifications, with a Customize link. You can click on that and make sure that you have the Subscribed option selected, to receive all notifications from the thread.

You can add a badge to display the status of the review in the README of your lesson repository with the following Markdown:

[![The Carpentries Lab Review Status](http://badges.carpentries-lab.org/16_status.svg)](https://github.com/carpentries-lab/reviews/issues/16)
ggrimes commented 1 year ago

@tobyhodges can you please pause this review as I need to update the main https://github.com/carpentries-incubator/workflows-nextflow repo with the latest code which has been developed in my personal github repository https://github.com/ggrimes/workflows-nextflow under the branch agnostric.

tobyhodges commented 1 year ago

Thanks for the heads-up, @ggrimes. Please let me know when you are ready for me to begin working through the editorial checklist.

ggrimes commented 10 months ago

I am ready to start the review process for this lesson. Please let me know the next steps

ggrimes commented 10 months ago

Potential Reviewers

Here is a list of potential reviewers, both experts in nextflow and nf-core. Also have contributed to training materials.

@christopher-hakkaart @mribeirodantas

tobyhodges commented 9 months ago

Editor Checklist - Introduction to Bioinformatics workflows with Nextflow and nf-core

Accessibility

The first three figures in Getting Started with Nextflow are lacking alternative text descriptions. These kinds of diagrams can be difficult to capture properly in a text description, but please give it a go and feel free to ask for help here if you need it.

The purpose of alternative text is to communicate the purpose of the image to someone who cannot see it, so I recommend you focus on the key points/concepts these figures are intended to get across to learners.

[Edit: alt-text has now been added]

Content

The example data is licensed CC-BY, which is not really appropriate for data but is owned by the lesson author, so I think we can be confident there will not be any problems.

Design

The Learner Profiles are sufficiently detailed. @ggrimes I recommend you add a link to that page from the introductory text in index.md.

Repository

The lesson repository includes:

Structure

Supporting information

The lesson includes:

I recommend that you open issues on the lesson repositories for each point raised above, to help you maintain a view of what has been addressed as you go along. You do not need to provide a written response to all of the points raised, but please post back here when you think they have been addressed. I can then run through the checklist again and confirm that the lesson is ready to move to review. If you would like to provide additional explanation for any of the points raised, I encourage you to do so.

ggrimes commented 9 months ago

Added alt text to all images in first episode issue #106

tobyhodges commented 9 months ago

The alternative text descriptions you added are a big improvement, thanks.

I will start contacting reviewers for the lesson.

tobyhodges commented 9 months ago

@bobturneruk @hadrienG thank you both for volunteering to review a lesson for The Carpentries Lab. Please can you confirm that you are happy to review this Introduction to Bioinformatics workflows with Nextflow and nf-core lesson?

You can read more about the lesson review process in our Reviewer Guide.

HadrienG commented 9 months ago

confirming I'm happy to review this lesson!

bobturneruk commented 9 months ago

I am, too.

tobyhodges commented 9 months ago

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.

ggrimes commented 9 months ago

Thanks @bobturneruk and @HadrienG for agreeing to review .

ggrimes commented 9 months ago

Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help. @tobyhodges should I let the other potential reviewers that i no longer need them?

tobyhodges commented 9 months ago

@tobyhodges should I let the other potential reviewers that i no longer need them?

Yes, I believe we are covered at this point. @HadrienG & @bobturneruk: of course, if circumstances change and you find that you no longer have capacity to review the lesson, do let me know and I can respond accordingly.

HadrienG commented 8 months ago

DRAFT REVIEW

Great job folks! I'm well in my way into the review process, and thought I'd post this even if I'm not done, so you can eventually start addressing the first comments. I'll change the header and this comment when I'm done.

Accessibility

Content

Episode 3 - Channels

Episode 4 - Processes

Episode 6 - Workflow

Episode 7 - Operators

Episode 9 - Nextflow configuration

Design

Supporting information

setup

General

Minor issues and bugs

The points tagged DSL2 are dependent on the above setup issue.

setup
getting started
workflow parametrisation
Channels
Processes
Processes part 2
Workflows
Operators
nf-core
ggrimes commented 8 months ago

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

HadrienG commented 8 months ago

@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.

Disclaimer/conflict of interest: I have previously published an nf-core pipeline and I'm a member of the nf-core community.

I think the nf-core episode will be helpful for a lot of people. Although the episode is not teaching nextflow per se, nf-core is a valuable resource, and researchers would benefit being aware that it exists, and learning how to launch pipelines.

Secondly, while there are other collections of nextflow pipelines out there, the community aspect of nf-core is quite unique, and I don't think including them is especially unfair to another community. What I would eventually suggest is keeping the nf-core episode but removing it from the lesson title and go with "Introduction to Bioinformatics workflows with Nextflow". You could even rename the nf-core episode "Launching publicly available pipelines" and start the episode with a non nf-core example: nextflow run carpentries/nextflow-example that would download and run a minimal nextflow piepline that you created. That way you illustrate to the leaners that they can run any nextflow pipeline they find on github, and then show nf-core as a community example that has a lot of pipelines available.

bobturneruk commented 7 months ago

Hi! I just wanted to say I am working on my review. I've got to ep 7. It's taking a while as I think it deserves to be done in some detail, as overall it seems to be such a well thought out and useful resource. Lots of specific comments to follow once I've had time to look at it all. I'm trying not to look at @HadrienG's report for now.

bobturneruk commented 7 months ago

I'm going to post my review below in a moment. I've probably made some mistakes - interpreted things wrongly or misunderstood something technical - and I'm mindful that in some places I'm asking for extra work to be done which means even more of someone's time and effort. Overall I hope it's helpful. I'm happy to discuss further via a call, Slack or on here.

Once again, my overall feeling is that this lesson will be a great help for people getting into Nextflow.

bobturneruk commented 7 months ago

Reviewer Checklist

Accessibility

Content

Specific comments follow...

ep1

ep2

ep3

ep4

ep5

ep6

Error executing process > 'MULTIQC'

Caused by: Process MULTIQC terminated with an error exit status (1)

Command executed:

multiqc .

Command exit status: 1

Command output: Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 42/42

Command error: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'seqyclean' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module seqyclean raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'optitype' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module optitype raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[WARNING] multiqc : No analysis results found. Cleaning up.. [INFO ] multiqc : MultiQC complete

Work dir: /home/bobturner/nf-training/work/17/30b05f5243e6fd12c0013cc12ea0c8

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

- This persists with subsequent examples

#### ep7
- Is it safe to assume that learners will be familiar with regular expressions? If not https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#regular-expression may be confusing.
- There's a lot of really good content in this episode, but will all of it be relevant to new Nextflow users? https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#grouping-contents-of-a-channel-by-a-key- might be a bit complicated? Or perhaps this is something that users encounter quite early in their Nextflow journey?

#### ep8

- Could this detail be referred to rather than included? Perhaps it is enough to say it's a unique ID? https://carpentries-incubator.github.io/workflows-nextflow/08-reporting.html#task-id
- The final exercise in this episode is the same as the preceding content. Maybe best to omit the exercise, or give the markdown example and ask learners to modify the command for given HTML as an exercise?

#### ep9

When I first tried the Conda exercise I got this error:

Error executing process > 'FASTP (1)'

Caused by: Failed to create Conda environment command: conda create --mkdir --yes --quiet --prefix /home/bobturner/nf-training/work/conda/env-a7a3a0d820eb46bc41ebf4f72d955e5f bioconda::fastp=0.12.4-0 status : 1 message: CondaValueError: You have chosen a non-default solver backend (libmamba) but it was not recognized. Choose one of: classic


I guess down to my Conda config. I could fix by:

conda config --set solver classic

- I suggest `docker.runOptions = '-u $(id -u):$(id -g)'` should be explained.

#### ep10

- I don't think learners are asked to create a file called `wc.nf` before they are asked to resume it in the first exercise here. It's maybe called `word_count.nf` earlier in the lesson?
- I'm not completely clear on what "The command wrapped used to run the job." means. As far as I can tell by searching, the definition of `.command.run` is not in the Nextflow docs. It might be better described as "The full Nextflow bash script used to run `.command.sh`."?

#### ep11

- Is it the intention that this episode re-explains lots of content from previous episodes? For example parameters and processes are covered as if new content. Perhaps there is scope to remove some of this e.g.

> A process is defined by providing three main declarations:
>
> 1. The process inputs,
> 2. The process outputs
> 3. Finally the command script.

- I suggest the "Recap" sections are reviewed - they may be useful but they introduce a different structure to ep11 compared with earlier episodes. They also often imply that content previously covered is first covered in ep11.
- "The FASTQC process will not run as the process has not been declared in the workflow scope." - is this the same definition of "scope" as used earlier? Is there a workflow scope as well as a params scope and an aws scope?
- I get this error when running script6:

Error executing process > 'MULTIQC'

Caused by: Process MULTIQC terminated with an error exit status (1)

Command executed:

multiqc .

Command exit status: 1

Command output: Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 141/141

Command error: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'seqyclean' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module seqyclean raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/seqyclean/seqyclean.py", line 18, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[ERROR ] multiqc : Oops! The 'optitype' MultiQC module broke... Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues If possible, please include a log file that triggers the error - the last file found was: None

Module optitype raised an exception: Traceback (most recent call last): File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/multiqc.py", line 594, in run output = mod() ^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/optitype/optitype.py", line 24, in init super(MultiqcModule, self).init( File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/modules/base_module.py", line 45, in init config.update({anchor: mod_cust_config.get("custom_config", {})}) File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 250, in update return update_dict(globals(), u) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/bobturner/miniconda3/envs/nf-training/lib/python3.12/site-packages/multiqc/utils/config.py", line 256, in update_dict if isinstance(val, collections.Mapping): ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'collections' has no attribute 'Mapping'

[WARNING] multiqc : No analysis results found. Cleaning up.. [INFO ] multiqc : MultiQC complete

Work dir: /home/bobturner/nf-training/work/13/2a6ef5a3e567c3f56e41d67eefe205

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

- Ternary operators should be explained more fully if the term is to be used.
- Graphviz installation is not part of the setup instructions, but is needed for the last exercise.
- I suggest the key points are reviewed to better match the Questions and Objectives of the lesson. They currently contain points covered earlier e.g. "A Workflow can be parameterise using params .".
- I don't think this episode introduces very much new content. An option would be to separate out the new content (e.g. "Metrics and Reports") and set up the episode as an extended exercise for learners - end the content prior to https://carpentries-incubator.github.io/workflows-nextflow/11-Simple_Rna-Seq_pipeline.html#define-the-pipeline-parameters and set an exercise to produce the code in `script6.nf`? Or maybe the extensive recap is intentional.

#### ep12

Some of this might be really specific to my setup...

- I get an error when pulling `rnaseq`:

nextflow pull nf-core/rnaseq -revision 3.0 Checking nf-core/rnaseq ... Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.id() is applicable for argument types: (String) values: [nf-validation@1.1.3] Possible solutions: is(java.lang.Object), is(java.lang.Object), find(), find(), find(groovy.lang.Closure), find(groovy.lang.Closure)

- Seems this issue may be related https://github.com/nextflow-io/nextflow/issues/888 - I tried making a clean directory to work in. Perhaps it's something to do with installing Nextflow via Conda? It works with my system Nextflow.
- Should learners be using the web based `launch` tool? I guess not.
- It might be useful to explain the relationship between a "profile" and a "scope".
- I get this error in the final exercise:

nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda --max_memory 3G -c nfcore-custom.config N E X T F L O W ~ version 23.04.2 Pulling nf-core/hlatyping ... downloaded from https://github.com/nf-core/hlatyping.git Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier


- `nextflow run nf-core/hlatyping -r 2.0.0 -profile test,conda  --max_memory '3.GB' --outdir out -c nfcore-custom.config` may be better, but needs Conda and I can't run the nf-core commands in Conda.
- gitter chat is retired. Google Groups might not still be useful. I think these should be replaced with a link to the Nextflow Slack.

### Design

- [x] Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
- [ ] The target audience identified for the lesson is specific and realistic.

- I'm not sure where the target audience is defined. I think if it's something like "Biological and Biomedical Scientists, Research Software Engineers and associated professionals" it would be great.

### Supporting information

- [ ] The list of required prior skills and/or knowledge is complete and accurate.
- [ ] The setup and installation instructions are complete, accurate, and easy to follow.
- [ ] No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. [Glosario](https://carpentries.github.io/glosario/).

- I think it would be helpful to make it clearer at the beginning of the instructions that this won't work on Windows, or if the participant has Windows, they'll need to use WSL2. Maybe linking to this would be helpful https://learn.microsoft.com/en-us/windows/wsl/install or perhaps that's too involved.
- The version of NextFlow specified is two major versions behind the current version (23, on 2024-03-22). Could a more recent version be used?
- Python 3.8, listed as a requirement, will be unsupported after quarter 3 of 2024 (https://devguide.python.org/versions/). Perhaps a more recent version could be used?
- The supplied `environment.yml` file (https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml) does not specify a version of Python. If Python version is important, then it should.
- For installing Conda, perhaps it would be best to link to the official instructions https://docs.anaconda.com/free/miniconda/miniconda-install/ ?
- I suggest re-ordering the instructions and making it clear how the `environment.yml` file should be obtained like this https://github.com/bobturneruk/workflows-nextflow/commit/8b2a7af3f58ce743f87b1d83d195ced022847c14
- I suggest if VSCode it what's best to use for the lesson, it is made a requirement, not a recommendation. This will mean greater consistency in learner experience.
- I suggest removing the instructions to install Nextflow without Conda. Again, I think consistency of setup will lead to less variation in issues experienced by learners.
- Installing with Conda went very smoothly for me!

- The Glossary only has 3 terms. I think it would benefit by being expanded.

### General

- Reviewed with Ubuntu 22.04 on WSL2.
- I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?
- I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".
ggrimes commented 7 months ago

Thank you @HadrienG and @bobturneruk for your reviews. Do you have any suggestions about how I should reply to individual review comments?

Thanks,

bobturneruk commented 7 months ago

Hi @ggrimes! @tobyhodges - how is this normally handled, please? I don't think GitHub makes this easy. Maybe we could break things down into issues against https://github.com/carpentries-incubator/workflows-nextflow/issues ?

tobyhodges commented 7 months ago

Thanks @bobturneruk and @HadrienG for your detailed reviews. @ggrimes you can choose how to respond to individual comments, based on your personal preference. In the past, other lesson developers have found it helpful to open issues to track the improvements suggested in reviews. If you do that, please link to the issues in this thread when you are ready to respond to the reviews: I would like to make it easy for visitors to the thread to find everything related to the review.

Responding to a few points from @bobturneruk:

Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.

CC-BY is not an appropriate license for data and if you are able to adjust the license on the FigShare record to use CC0 I recommend to do so (bonus points if you add a CFF file as well), but the mistake is common enough and harmless enough that I am generally happy to let it pass if needed. I think the technically correct thing to do with data included in a lesson repository would be to mention in LICENSE.md that the data is available CC0. But honestly we have data files in lesson repositories elsewhere that we are not doing this for, and I would certainly be willing to let this go for the purposes of the review.

I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?

&

I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".

As @bobturneruk predicted, GROOVY labels are added to the code blocks because syntax highlighting is being applied by specifying that blocks are groovy in the lesson source. Unfortunately, until a separate specification for nextflow is added to https://github.com/jgm/skylighting this is the best we can get. I agree that a note explaining this early in the lesson would be helpful to anyone following along on their own.