Open gperu opened 1 year ago
Thanks for submitting this lesson to The Carpentries Lab, @gperu, @ggrimes and @ameynert. I'm excited to see it enter review!
I'll be acting as Editor on the submission. I am just about to go on leave for a few weeks, but I will work through the Editor checklist when I get back in January. When I've completed those checks, and anything they bring up has been addressed, I can begin a search for reviewers.
For now, to ensure that the review process runs as smoothly as possible, please make sure you are subscribed to receive notifications from this thread. On the right sidebar of this page you should see a section headed Notifications, with a Customize link. You can click on that and make sure that you have the Subscribed option selected, to receive all notifications from the thread.
You can add a badge to display the status of the review in the README of your lesson repository with the following Markdown:
[![The Carpentries Lab Review Status](http://badges.carpentries-lab.org/16_status.svg)](https://github.com/carpentries-lab/reviews/issues/16)
@tobyhodges can you please pause this review as I need to update the main https://github.com/carpentries-incubator/workflows-nextflow repo with the latest code which has been developed in my personal github repository https://github.com/ggrimes/workflows-nextflow under the branch agnostric.
Thanks for the heads-up, @ggrimes. Please let me know when you are ready for me to begin working through the editorial checklist.
I am ready to start the review process for this lesson. Please let me know the next steps
Potential Reviewers
Here is a list of potential reviewers, both experts in nextflow and nf-core. Also have contributed to training materials.
@christopher-hakkaart @mribeirodantas
The first three figures in Getting Started with Nextflow are lacking alternative text descriptions. These kinds of diagrams can be difficult to capture properly in a text description, but please give it a go and feel free to ask for help here if you need it.
The purpose of alternative text is to communicate the purpose of the image to someone who cannot see it, so I recommend you focus on the key points/concepts these figures are intended to get across to learners.
[Edit: alt-text has now been added]
The example data is licensed CC-BY, which is not really appropriate for data but is owned by the lesson author, so I think we can be confident there will not be any problems.
The Learner Profiles are sufficiently detailed.
@ggrimes I recommend you add a link to that page from the introductory text in index.md
.
The lesson repository includes:
The lesson includes:
I recommend that you open issues on the lesson repositories for each point raised above, to help you maintain a view of what has been addressed as you go along. You do not need to provide a written response to all of the points raised, but please post back here when you think they have been addressed. I can then run through the checklist again and confirm that the lesson is ready to move to review. If you would like to provide additional explanation for any of the points raised, I encourage you to do so.
Added alt text to all images in first episode issue #106
The alternative text descriptions you added are a big improvement, thanks.
I will start contacting reviewers for the lesson.
@bobturneruk @hadrienG thank you both for volunteering to review a lesson for The Carpentries Lab. Please can you confirm that you are happy to review this Introduction to Bioinformatics workflows with Nextflow and nf-core lesson?
You can read more about the lesson review process in our Reviewer Guide.
confirming I'm happy to review this lesson!
I am, too.
Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help.
Thanks @bobturneruk and @HadrienG for agreeing to review .
Excellent, thank you both. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask and I will be happy to help. @tobyhodges should I let the other potential reviewers that i no longer need them?
@tobyhodges should I let the other potential reviewers that i no longer need them?
Yes, I believe we are covered at this point. @HadrienG & @bobturneruk: of course, if circumstances change and you find that you no longer have capacity to review the lesson, do let me know and I can respond accordingly.
Great job folks! I'm well in my way into the review process, and thought I'd post this even if I'm not done, so you can eventually start addressing the first comments. I'll change the header and this comment when I'm done.
process_rscript.nf
: including the ShortRead
package in the conda
environment would let the learners run the script.collect()
should be introduced first in the channels episode, instead of appearing here for the first time.into
operator is deprecated. NFCORE_RNASEQ:RNA_SEQ:SAMTOOLS_SORT
) but scoping is not mentioned or explained previously.The conda
setup and the standalone setup do not install the same version of nextflow, which causes issues down the line. 20.10
is installed standalone, and is assumed to be used throughout the whole lesson, However, conda
installs >=20.10
. I would suggest to migrate the setup and all episodes to >=22.03
. This removes the need for nextflow.enable.dsl=2
in the scripts, and only would require a few small changes in some episodes. DSL1 is old, deprecated and in my opinion unnecessary to bring up / explain.
It is not crystal clear that the "Nextflow install without conda" part is optional.
nf-core
is missing in the environment.yml
The points tagged DSL2 are dependent on the above setup issue.
$
sot the command can be copy/pasted more easilynextflow run word_count.nf
gives 0 because the input file does not exist. Should be data/yeast/reads/ref1_1.fq.gz
cp wc.nf wc-params.nf
change wc.nf
to word_count.nf
word_count vs wf-params
)~~~
left in the scripttree
is not available on macOS by default. Personally, I'd avoid using itworkflow_01.nf
is fastqc on the website, salmon in the scripts.~~~{: .language-groovy }
present in the solution@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.
@HadrienG I would be interested in your opinion as to whether the nf-core episode should be included ,or removed and just have core nextflow lessons.
Disclaimer/conflict of interest: I have previously published an nf-core pipeline and I'm a member of the nf-core community.
I think the nf-core episode will be helpful for a lot of people. Although the episode is not teaching nextflow per se, nf-core is a valuable resource, and researchers would benefit being aware that it exists, and learning how to launch pipelines.
Secondly, while there are other collections of nextflow pipelines out there, the community aspect of nf-core is quite unique, and I don't think including them is especially unfair to another community. What I would eventually suggest is keeping the nf-core episode but removing it from the lesson title and go with "Introduction to Bioinformatics workflows with Nextflow". You could even rename the nf-core episode "Launching publicly available pipelines" and start the episode with a non nf-core example: nextflow run carpentries/nextflow-example
that would download and run a minimal nextflow piepline that you created. That way you illustrate to the leaners that they can run any nextflow pipeline they find on github, and then show nf-core as a community example that has a lot of pipelines available.
Hi! I just wanted to say I am working on my review. I've got to ep 7. It's taking a while as I think it deserves to be done in some detail, as overall it seems to be such a well thought out and useful resource. Lots of specific comments to follow once I've had time to look at it all. I'm trying not to look at @HadrienG's report for now.
I'm going to post my review below in a moment. I've probably made some mistakes - interpreted things wrongly or misunderstood something technical - and I'm mindful that in some places I'm asking for extra work to be done which means even more of someone's time and effort. Overall I hope it's helpful. I'm happy to discuss further via a call, Slack or on here.
Once again, my overall feeling is that this lesson will be a great help for people getting into Nextflow.
[ ] The alternative text of all figures is accurate and sufficiently detailed.
[x] The lesson content does not make extensive use of colloquialisms, region- or culture-specific references, or idioms.
[x] The lesson content does not make extensive use of contractions (“can’t” instead of “cannot”, “we’ve” instead of “we have”, etc).
Figures in episode 1 appear blurry and have very small text (Windows 11, Chrome, 100% zoom on browser).
Figures lacking detailed alternative text:
The lesson content:
[x] Tools used in the lesson are open source or, where tools used are closed source/proprietary, there is a good reason for this e.g. no open source alternatives are available or widely-used in the lesson domain.
[ ] Any example data sets used in the lesson are accessible, well-described, available under a CC0 license, and representative of data typically encountered in the domain.
[x] The lesson does not make use of superfluous data sets, e.g. increasing cognitive load for learners by introducing a new data set instead of reusing another that is already present in the lesson.
[x] The example tasks and narrative of the lesson are appropriate and realistic.
[x] The solutions to all exercises are accurate and sufficiently explained.
[x] The lesson includes exercises in a variety of formats.
[x] Exercise tasks and formats are appropriate for the expected experience level of the target audience.
[x] All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment.
[ ] Exercises are designed with diagnostic power.
Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.
I don't have the expertise to comment on whether exercises are designed with diagnostic power.
I've made a PR with several minor comments which the maintainers might accept or reject individually.
Specific comments follow...
0
?
(nf-training) bobturner@tubby:~/nf-training$ nextflow run word_count.nf
N E X T F L O W ~ version 20.10.0
Launching `word_count.nf` [boring_nobel] - revision: 72656509cb
executor > local (1)
[14/523859] process > NUM_LINES (1) [100%] 1 of 1 ✔
SRR2584863_1.fastq.gz 0
sleep_before
and sleep_after
parameters? fromFilePairs
section "The matching files are emitted as tuples". Maybe there is an important distinction, but I'm a bit confused. I see Tuples are defined here https://www.nextflow.io/docs/latest/process.html#input-type-tuple and in the lesson here https://carpentries-incubator.github.io/workflows-nextflow/05-processes-part2.html#grouped-inputs-and-outputsfromSRA
they will need instructions on how to get an NCBI API key, and time to do this.-process.echo
is explained before it is used here https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#process-definition - I don't think it's in earlier material?PYSTUFF
be called something more specific, please? Same for RSTUFF
. It's better for readers if short, descriptive names are used.myscript.py
be called something more specific, please?shell:
way of using Bash in Nextflow, please? Perhaps better to stick with script:
? https://carpentries-incubator.github.io/workflows-nextflow/04-processes-part1.html#shell${
...}
when used in a script? What is good practise?
executor > local (10)
[e5/e6fded] process > FASTQC (5) [100%] 9 of 9 ✔
[17/30b05f] process > MULTIQC [100%] 1 of 1, failed: 1 ✘
Error executing process > 'MULTIQC'
Caused by:
Process MULTIQC
terminated with an error exit status (1)
Command executed:
multiqc .
Command exit status: 1
Command output: Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 42/42
[WARNING] multiqc : No analysis results found. Cleaning up.. [INFO ] multiqc : MultiQC complete
Work dir: /home/bobturner/nf-training/work/17/30b05f5243e6fd12c0013cc12ea0c8
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
- This persists with subsequent examples
#### ep7
- Is it safe to assume that learners will be familiar with regular expressions? If not https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#regular-expression may be confusing.
- There's a lot of really good content in this episode, but will all of it be relevant to new Nextflow users? https://carpentries-incubator.github.io/workflows-nextflow/07-operators.html#grouping-contents-of-a-channel-by-a-key- might be a bit complicated? Or perhaps this is something that users encounter quite early in their Nextflow journey?
#### ep8
- Could this detail be referred to rather than included? Perhaps it is enough to say it's a unique ID? https://carpentries-incubator.github.io/workflows-nextflow/08-reporting.html#task-id
- The final exercise in this episode is the same as the preceding content. Maybe best to omit the exercise, or give the markdown example and ask learners to modify the command for given HTML as an exercise?
#### ep9
When I first tried the Conda exercise I got this error:
Error executing process > 'FASTP (1)'
Caused by: Failed to create Conda environment command: conda create --mkdir --yes --quiet --prefix /home/bobturner/nf-training/work/conda/env-a7a3a0d820eb46bc41ebf4f72d955e5f bioconda::fastp=0.12.4-0 status : 1 message: CondaValueError: You have chosen a non-default solver backend (libmamba) but it was not recognized. Choose one of: classic
I guess down to my Conda config. I could fix by:
conda config --set solver classic
- I suggest `docker.runOptions = '-u $(id -u):$(id -g)'` should be explained.
#### ep10
- I don't think learners are asked to create a file called `wc.nf` before they are asked to resume it in the first exercise here. It's maybe called `word_count.nf` earlier in the lesson?
- I'm not completely clear on what "The command wrapped used to run the job." means. As far as I can tell by searching, the definition of `.command.run` is not in the Nextflow docs. It might be better described as "The full Nextflow bash script used to run `.command.sh`."?
#### ep11
- Is it the intention that this episode re-explains lots of content from previous episodes? For example parameters and processes are covered as if new content. Perhaps there is scope to remove some of this e.g.
> A process is defined by providing three main declarations:
>
> 1. The process inputs,
> 2. The process outputs
> 3. Finally the command script.
- I suggest the "Recap" sections are reviewed - they may be useful but they introduce a different structure to ep11 compared with earlier episodes. They also often imply that content previously covered is first covered in ep11.
- "The FASTQC process will not run as the process has not been declared in the workflow scope." - is this the same definition of "scope" as used earlier? Is there a workflow scope as well as a params scope and an aws scope?
- I get this error when running script6:
Error executing process > 'MULTIQC'
Caused by:
Process MULTIQC
terminated with an error exit status (1)
Command executed:
multiqc .
Command exit status: 1
Command output: Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 141/141
[WARNING] multiqc : No analysis results found. Cleaning up.. [INFO ] multiqc : MultiQC complete
Work dir: /home/bobturner/nf-training/work/13/2a6ef5a3e567c3f56e41d67eefe205
Tip: when you have fixed the problem you can continue the execution adding the option -resume
to the run command line
- Ternary operators should be explained more fully if the term is to be used.
- Graphviz installation is not part of the setup instructions, but is needed for the last exercise.
- I suggest the key points are reviewed to better match the Questions and Objectives of the lesson. They currently contain points covered earlier e.g. "A Workflow can be parameterise using params .".
- I don't think this episode introduces very much new content. An option would be to separate out the new content (e.g. "Metrics and Reports") and set up the episode as an extended exercise for learners - end the content prior to https://carpentries-incubator.github.io/workflows-nextflow/11-Simple_Rna-Seq_pipeline.html#define-the-pipeline-parameters and set an exercise to produce the code in `script6.nf`? Or maybe the extensive recap is intentional.
#### ep12
Some of this might be really specific to my setup...
- I get an error when pulling `rnaseq`:
nextflow pull nf-core/rnaseq -revision 3.0 Checking nf-core/rnaseq ... Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.id() is applicable for argument types: (String) values: [nf-validation@1.1.3] Possible solutions: is(java.lang.Object), is(java.lang.Object), find(), find(), find(groovy.lang.Closure), find(groovy.lang.Closure)
- Seems this issue may be related https://github.com/nextflow-io/nextflow/issues/888 - I tried making a clean directory to work in. Perhaps it's something to do with installing Nextflow via Conda? It works with my system Nextflow.
- Should learners be using the web based `launch` tool? I guess not.
- It might be useful to explain the relationship between a "profile" and a "scope".
- I get this error in the final exercise:
nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda --max_memory 3G -c nfcore-custom.config N E X T F L O W ~ version 23.04.2 Pulling nf-core/hlatyping ... downloaded from https://github.com/nf-core/hlatyping.git Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier
- `nextflow run nf-core/hlatyping -r 2.0.0 -profile test,conda --max_memory '3.GB' --outdir out -c nfcore-custom.config` may be better, but needs Conda and I can't run the nf-core commands in Conda.
- gitter chat is retired. Google Groups might not still be useful. I think these should be replaced with a link to the Nextflow Slack.
### Design
- [x] Learning objectives for the lesson and its episodes are clear, descriptive, and measurable. They focus on the skills being taught and not the functions/tools e.g. “filter the rows of a data frame based on the contents of one or more columns,” rather than “use the filter function on a data frame.”
- [ ] The target audience identified for the lesson is specific and realistic.
- I'm not sure where the target audience is defined. I think if it's something like "Biological and Biomedical Scientists, Research Software Engineers and associated professionals" it would be great.
### Supporting information
- [ ] The list of required prior skills and/or knowledge is complete and accurate.
- [ ] The setup and installation instructions are complete, accurate, and easy to follow.
- [ ] No key terms are missing from the lesson glossary or are not linked to definitions in an external glossary e.g. [Glosario](https://carpentries.github.io/glosario/).
- I think it would be helpful to make it clearer at the beginning of the instructions that this won't work on Windows, or if the participant has Windows, they'll need to use WSL2. Maybe linking to this would be helpful https://learn.microsoft.com/en-us/windows/wsl/install or perhaps that's too involved.
- The version of NextFlow specified is two major versions behind the current version (23, on 2024-03-22). Could a more recent version be used?
- Python 3.8, listed as a requirement, will be unsupported after quarter 3 of 2024 (https://devguide.python.org/versions/). Perhaps a more recent version could be used?
- The supplied `environment.yml` file (https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml) does not specify a version of Python. If Python version is important, then it should.
- For installing Conda, perhaps it would be best to link to the official instructions https://docs.anaconda.com/free/miniconda/miniconda-install/ ?
- I suggest re-ordering the instructions and making it clear how the `environment.yml` file should be obtained like this https://github.com/bobturneruk/workflows-nextflow/commit/8b2a7af3f58ce743f87b1d83d195ced022847c14
- I suggest if VSCode it what's best to use for the lesson, it is made a requirement, not a recommendation. This will mean greater consistency in learner experience.
- I suggest removing the instructions to install Nextflow without Conda. Again, I think consistency of setup will lead to less variation in issues experienced by learners.
- Installing with Conda went very smoothly for me!
- The Glossary only has 3 terms. I think it would benefit by being expanded.
### General
- Reviewed with Ubuntu 22.04 on WSL2.
- I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?
- I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".
Thank you @HadrienG and @bobturneruk for your reviews. Do you have any suggestions about how I should reply to individual review comments?
Thanks,
Hi @ggrimes! @tobyhodges - how is this normally handled, please? I don't think GitHub makes this easy. Maybe we could break things down into issues against https://github.com/carpentries-incubator/workflows-nextflow/issues ?
Thanks @bobturneruk and @HadrienG for your detailed reviews. @ggrimes you can choose how to respond to individual comments, based on your personal preference. In the past, other lesson developers have found it helpful to open issues to track the improvements suggested in reviews. If you do that, please link to the issues in this thread when you are ready to respond to the reviews: I would like to make it easy for visitors to the thread to find everything related to the review.
Responding to a few points from @bobturneruk:
Interestingly, because the example data is provided by The Carpentries in the lesson repo, it is CC-BY, not CC0.
CC-BY is not an appropriate license for data and if you are able to adjust the license on the FigShare record to use CC0 I recommend to do so (bonus points if you add a CFF file as well), but the mistake is common enough and harmless enough that I am generally happy to let it pass if needed. I think the technically correct thing to do with data included in a lesson repository would be to mention in LICENSE.md
that the data is available CC0. But honestly we have data files in lesson repositories elsewhere that we are not doing this for, and I would certainly be willing to let this go for the purposes of the review.
I think, strictly speaking, the scripts (e.g. https://carpentries-incubator.github.io/workflows-nextflow/01-getting-started-with-nextflow.html#your-first-script) is not written in GROOVY (as stated) but in Nextflow, or maybe it should be Nextflow DSL2? I guess Groovy is specified to help with syntax highlighting?
&
I think there needs to be an explanation of the relationship between Groovy and Nextflow DSL2 in the introduction, otherwise it's confusing when Groovy operators and variable types are introduced later on. Could say something like "The Nextflow language is based on another language, Groovy. Some Groovy code can be used directly in Nextflow, but not all Groovy code is valid.".
As @bobturneruk predicted, GROOVY labels are added to the code blocks because syntax highlighting is being applied by specifying that blocks are groovy
in the lesson source. Unfortunately, until a separate specification for nextflow
is added to https://github.com/jgm/skylighting this is the best we can get. I agree that a note explaining this early in the lesson would be helpful to anyone following along on their own.
Lesson Title
Introduction to Bioinformatics workflows with Nextflow and nf-core
Lesson Repository URL
https://github.com/carpentries-incubator/workflows-nextflow
Lesson Website URL
https://carpentries-incubator.github.io/workflows-nextflow/
Lesson Description
This lesson is a three day introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.
Nextflow enables scalable and reproducible scientific workflows using software enviroments like conda. It allows the adaptation of pipelines written in the most common scripting languages such as Bash, R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.
This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.
This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.
Author Usernames
@ggrimes @ameynert
Zenodo DOI
No response
Differences From Existing Lessons
No response
Confirmation of Lesson Requirements
JOSE Submission Requirements
paper.md
andpaper.bib
files as described in the JOSE submission guide for learning modulesPotential Reviewers
No response