MPI-EVA-Archaeogenetics / comp_human_adna_book

This book summarises prepared mini-courses for various computational tools and methods in the field of human archaeogenetic data analysis, with a particular emphasis on population genetics.
https://mpi-eva-archaeogenetics.github.io/comp_human_adna_book/
12 stars 7 forks source link

Hackaton 20240912 Review - 5. Introduction to nf-core/eager #15

Open martynamolak opened 1 month ago

martynamolak commented 1 month ago

Reg eager.qmd The figure could use a caption; particularly eplaining what the green and white bits are (which honestly I couldn't find on any of the nf-core/eager websites :P ): [nf-core/eager pipeline. The green steps are...; the white steps are...]

martynamolak commented 1 month ago

Also: "prepare a custom profile tailored to your computational resources and setup" Could we elaborate a bit or link to some tutorial. I think it is not directly obvious what a profile is let alone how to prepare one. So I would actually add a whole section on the profiles

martynamolak commented 1 month ago

I guess the Tip: "At DAG, we can take advantage of all the information entered in Pandora to produce a eager-ready tsv with pandora2eager." should be removed for the public version...

martynamolak commented 1 month ago

"To avoid re-mapping the whole dataset and conserve computing resources, also consider providing the mapped bam files to nf-core/eager directly." Can one record in eager input tsv can contain fastq AND bam file? (the example only shows either; but it might be useful for people to keep both type of files linked in the tsv). Can some additional columns be added? e.g. with info on the reference used for bam.

martynamolak commented 1 month ago

"5.3.3 Parameter customization By default nf-core/eager runs the following"

Is that only "by default" for the eva profile of generally? If the former, I think it should be explicitly said. And perhaps also we could say the eva profile is openly available and any lab can use it and build their profile on it (of course assuming this is true)

martynamolak commented 1 month ago

Also: "prepare a custom profile tailored to your computational resources and setup" Could we elaborate a bit or link to some tutorial. I think it is not directly obvious what a profile is let alone how to prepare one. So I would actually add a whole section on the profiles

OK, I see it's addressed in the "5.3.3 Parameter customization" section so perhaps it could be referred to here. Also I would suggest systematizing and extending the config section a bit.

martynamolak commented 1 month ago

"To make sure that the workflow continues running when you disconnect from the cluster or shut down your computer, nf-core/eager should be run in a screen session." Well this is quite specific to your computing resources setup. I suggest removing the "screen" section altogether. (while the tower section could use a sentence or two more.

martynamolak commented 1 month ago

"If you spot java.lang.OutOfMemoryError: unable to create new native thread in .command.log or command.err, you can delete the individual job from the scheduling queue. It will be re-submitted automatically with larger memory allocation." What do you mean by "delete the individuals job"? Could you write hoe in practise one would do that (write a command, remove a file by hand?) and how to make sure the job is indeed re-submitted and with larger memory allocation?

martynamolak commented 1 month ago

"You can also supply a run name to resume a specific run: -resume [run-name]." Does it mean that if you don't specify the run-name it will resume running any job it has even been allocated, or is it in a current location or in a current instance of nextflow software running?

Also, do you specify the "RUNNAME" anywhere? Is it drawn from the .tsv and/or profile names automatically? Or is it a dirname of the working directory from which you run eager?

martynamolak commented 1 month ago

5.4 code box could use a caption.

stschiff commented 1 month ago

Thanks so much @martynamolak. I think that's all mainly for @scarlhoff, am I right?