galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

New tutorial - large genome assembly and polishing #3652

Closed AnnaSyme closed 1 year ago

AnnaSyme commented 1 year ago

Hi all,

This is a tutorial I wrote a while ago for assembling and polishing large genomes. I've added it here in case you all think it is useful for including in the GTN materials.

If so, here are some notes:

I think it is different enough from the existing assembly tutorials because it covers large plant/animal genomes, but is different to the VGP tutorials as it uses different sequencing data (Nanopore), tools (Flye) and covers a polishing step.

The tutorial comprises a series of workflows, with information about the inputs, settings, and expected outputs. There's also a section about how to combine all these workflows.

During serving locally, I can't see the tutorial appear in the list of assembly topics (I can only view it if I enter the filename in the address bar). So I think I have missed adding the tutorial information to an index or metadata file. (I also can't see the workflows in the list of supporting information at the top of the tutorial, which may be the same problem.) Thanks if you are able to point me in the right direction.

These workflows were tested on Galaxy Australia but have not yet been extensively tested on other Galaxy servers - I am in the process of doing this but always welcome anyone who also wants to test - with thanks.

I'm not sure how to approach explaining if users need expanded Galaxy storage quotas and if so who they should contact. Any advice much appreciated.

Please don't hesitate to add yourself in authorship if you contribute/review.

Thanks ~Anna

hexylena commented 1 year ago

Oh I didn't respond to the rest? I could have sworn I did, sorry @AnnaSyme

I think it is different enough from the existing assembly

yes of course! more tutorials always welcome evne if similar.

I'm not sure how to approach explaining if users need expanded Galaxy storage quotas and if so who they should contact. Any advice much appreciated.

This could be a good chance to do a CYOA tutorial to have a lil tip box on "do you need more quota? Which server are you using?" and linking to the right place for each maybe.

These workflows were tested on Galaxy Australia but have not yet been extensively tested on other Galaxy servers

If you write a workflow test, I think that's sufficient. We can more easily test on other servers!

During serving locally, I can't see the tutorial appear in the list of assembly topics (

If you're using 'serve-quick', the index page isn't regenerated. touch topics/assembly/index.md and it will likely show up.

AnnaSyme commented 1 year ago

Thanks everyone for your comments and suggestions - it is much appreciated. I am going on leave for a few months and wanted to let you know that these will be worked on by the Galaxy Au team in the upcoming BioHackathon. In the meantime, or as well, I am very happy for others to make any changes/fixes/additions and to please also add yourself as authors (if you want to!).

AnnaSyme commented 1 year ago

Thanks for your suggestions @gallardoalba - I think I will collate these into a set of enhancements that can be worked on when we get the chance?

AnnaSyme commented 1 year ago

Would it be interesting to add a workflow report, to the workflows, to provide a good visual/summary of the workflow's results?

Yes for sure, I had added these originally but was having issues with some images so have taken out for now - will add this to a list of enhancements

AnnaSyme commented 1 year ago

List of tutorial enhancements with thanks to @hexylena and @gallardoalba

Perhaps list could be set as an issue and tagged with PaperCuts etc?

I think the current tutorial content is ok to go now but these extra suggestions would always enhance the material:

Add a workflow report, to the workflows, to provide a good visual/summary of the workflow's results

Provide some additional details about the kind of technology used for generating the data, since it determines the pipeline and the outputs; concretely I consider it important to highlight the characteristics of Nanopore reads.

Explain under the figure about challenges of a diploid assembly: explain more and highlight differences between plants and animals

Running workflows: Here I would suggest following a structure similar to the one used in the VGP training https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_workflow_training/tutorial.html#import-workflows-from-workflowhub. Perhaps we could create a snippet for that?

Explain more why the Nanopore reads are not processed? I think it would be nice to explain a little bit the workflow in a comment box.

Data QC / Busco / Quast results and plots: add plots and explain more

Better explain or add references to this: I cannot understand this sentence: If kmer length approaches read length, this means the average depth of your sequencing is also ~X25, and there would be a peak in the graph at this position (smaller kmers = higher kmer depth). Would you mind to explain it a little bit?

Workflows (eg kmer counting, assembly, polishing, assessment): I would include a simple image for explaining the workflow, as well as the inputs and the outputs. Also, I think it would be nice to explain a little bit about the role of the different tools.

Extra information (eg about centromeres and haplotigs): put in a comment box

hexylena commented 1 year ago

Added a comment to fix the linting issues:

hexylena commented 1 year ago

overall I'm really excited for this! and it uses WFHub, very cool.

hexylena commented 1 year ago

Perhaps list could be set as an issue and tagged with PaperCuts etc?

yes that's fine!

Perhaps we could create a snippet for that?

absolutely. And in 23.0 we can replace it with a URL to click on, as there will be support for that :)

shiltemann commented 1 year ago

Thanks @AnnaSyme! For the hands-on boxes, do you think you could restructure them a bit to fit with the style of regular hands-on boxes? e.g. a bit like the boxes that instruct to run a workflow from the 16S tutorial:

image

(code for the box in this screenshot is here)

AnnaSyme commented 1 year ago

Hi @shiltemann - yes - so should I separate the sections more clearly into 1-import a workflow, and 2-run the workflow ?

Thanks @AnnaSyme! For the hands-on boxes, do you think you could restructure them a bit to fit with the style of regular hands-on boxes? e.g. a bit like the boxes that instruct to run a workflow from the 16S tutorial:

image

(code for the box in this screenshot is here)

shiltemann commented 1 year ago

Hey @AnnaSyme. Yes it would be nice to have explicit instructions like above about the import step of the workflows (but whether that is 1 box at the start to import them all, or a step every time just before execution like the screenshot is up to you)

I was mostly referring to the formatting, making it similar to tool hands-on boxes, so numbered list for each tool/workflow they should run, having the name of the tool/workflow in bold, then a sublist explicitly listing every parameter setting to use, etc. Then any explanation of why these settings etc separately below or outside of the box.

Does that make sense? If you would like some help here let me know!

AnnaSyme commented 1 year ago

Hey @AnnaSyme. Yes it would be nice to have explicit instructions like above about the import step of the workflows (but whether that is 1 box at the start to import them all, or a step every time just before execution like the screenshot is up to you)

I was mostly referring to the formatting, making it similar to tool hands-on boxes, so numbered list for each tool/workflow they should run, having the name of the tool/workflow in bold, then a sublist explicitly listing every parameter setting to use, etc. Then any explanation of why these settings etc separately below or outside of the box.

Does that make sense? If you would like some help here let me know!

Thanks @shiltemann - I've given that a go!

AnnaSyme commented 1 year ago

The most recent linting is showing errors with workflows - missing licences etc - would it be ok to add these in later? I need to test the workflows on non Galaxy Au instances in any case and could refine tests and licences then.

hexylena commented 1 year ago

The most recent linting is showing errors with workflows - missing licences etc - would it be ok to add these in later? I need to test the workflows on non Galaxy Au instances in any case and could refine tests and licences then.

yeah that's fine! no worries

gallardoalba commented 1 year ago

Thanks @AnnaSyme!

AnnaSyme commented 1 year ago

Thanks everyone!

shiltemann commented 1 year ago

@AnnaSyme would you be interested in recording this tutorial for the upcoming Smörgåsbord 3?

AnnaSyme commented 1 year ago

@AnnaSyme would you be interested in recording this tutorial for the upcoming Smörgåsbord 3?

Hi @shiltemann yes, sounds good!