galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

Galaxy Hub tutorials: The Roadmap #269

Closed nekrut closed 4 years ago

nekrut commented 7 years ago

All:

I have been creating a series of tutorials focused on usegalaxy.org that are hosted on galaxy-hub. They are heavily based on content within this repository. My intention with these are:

So far I have done this with RNAseq tutorials developed by @MoHeydarian and @malloryfreeberg (https://github.com/galaxyproject/galaxy-hub/pull/199). The next will be ChIP-seq tutorial from @MoHeydarian and @malloryfreeberg.

The major changes were:

I would greatly appreciate your feedback on these.

bgruening commented 7 years ago

Also please look at this event and consider participating. https://docs.google.com/document/d/1dLCL5-2pkWTvmGTz7GyQkb5LEIZ3FBkPvCBoqUPGUOg @dannon and @tnabtaf will be there and we will work also on the infrastructure of this project.

distributing sample datasets via Galaxy libraries (in addition to Zenodo).

Imho we should describe our training material very unspecific to any Galaxy instance. Datasets should be available via a DOI and we should explain howto download it. In addition we should (as community) come up with a standard way to organise data libraries and add a optional sentence to to the training: "All the needed data for the training material can be found under training_material/foo/bar/version1/". Then all supported Galaxy instances can add themself as supporter of this training (we have metadata for this) but they need to follow this library recommendations.

nekrut commented 7 years ago

While I agree that training materials should be ultimately instance agnostic, the current challenge is tool sets - instances can be very different in their toolsets and so it may not be possible to run a given tutorial there

bgruening commented 7 years ago

We should fix this and every instance that are listed as "supported instance" will need to have these tools installed - do you have tools that can not be shared?

nekrut commented 7 years ago

No, I only use tools from IUC

nekrut commented 7 years ago

I fact the main reason I'm putting tutorials to hub is to have a "curated" set where all pieces work. Should we create a curated section in training-materials?

bgruening commented 7 years ago

We could define some sorts of training dependencies, we need this as well for the docker image. And if a Galaxy instance do support all of these, we will annotate the training with this instance and provide the link to this instance to the user.

bgruening commented 7 years ago

Ideally all training should have this curation here. We just rushed and moved this repo too early to galaxyproject and now we need to clean-up. But I would like to have all trainings here stick to these standards.

dannon commented 7 years ago

I'm not sure the 'supported instance' bit is the best path. My concern is that it isn't maintainable, both in keeping tutorials here up-to-date, and on the server maintenance side. How about having some code that can just query a server's API to see where it'll work? Then a list of 'compatible' servers is always available.

MoHeydarian commented 7 years ago

IMO all Galaxy instances should carry the tools that the tutorials of the GTN utilize. These tutorials cover the basic bioinformatic pipelines and use established and vetted tools. If there is a "base list" of Galaxy tools, it should start with the ones in the GTN tutorials.

It isn't realistic to have all public versions of Galaxy carry tools we deem necessary, but we can make sure that at least Main, Test, Cloudman (on AWS and JS), and instances maintained by committers and close friends of the project have this "base list" of tools installed.

I'm still not sold on having a Docker image for each tutorial with the idea that users will be able to successfully: 1) launch the Docker image on their machines and 2) have a machine with the resources to accomplish the tasks in the tutorial.

MoHeydarian commented 7 years ago

@nekrut how does using IGV instead of Trackster make this tutorial more Main-centric? Won't this require users to download and launch IGV from their machines? With Trackster users can transfer data from a history to the genome browser view quite seamlessly, or am I missing something?

dannon commented 7 years ago

I do feel like that limits the tutorials we can curate within this repo, and unnecessarily.

What if we, as a part of the build process (or some separate on-demand or automated mechanism) were able to generate a matrix of which tutorials worked on which servers, given a list of servers to query? This should be fairly straightforward to implement, as long as we can annotate tutorial requirements expressively enough. I'd imagine we could also, for UI features, require particular versions, which we can verify via the API.

It'd have the added benefits of allowing us to easily add more public servers, and also ensure servers remain compatible.

yvanlebras commented 7 years ago

Hi all, IMO some Galaxy instances don't have NGS related tools as some are related to proteomics, bioimaging and others... and maybe there will be GTN tutorials on these topics in the future... Moreover I think that Docker can be something more usable by Biologists in thé future... Testing Galaxy flavors through kitematic for Windows since months give me some hope that this can be reachable... In the meantime, I'm agree that making all tutorials usable from usegalaxy.org is a very good thing. Finally, concerning IGV, I personally always prefer showing mapping results and others through Trackster because it works easily on every Galaxy instance...

bgruening commented 7 years ago

I completely agree with @yvanlebras we should not limit the scope of Galaxy training. People currently developing material for literature mining and metabolomics and I'm not sure usegalaxy.org wants to have this. Moreover, the Docker setup will maybe power soon the backend of Cloudman. And I have discussed ideas with @afgane to lunch all these training instances from cloudlaunch. There is no much potential in this training - we just need to get the things together.

@dannon I completely agree. We can test the the annotated servers and maintain a list. Vice versa I see the docker-directory also as a good way to ease the way for admins to install all all necessary bits to support one tutorial. E.g. the tool yaml file, the populate data-library scripts etc can all be run in a cron job if needed to keep up-to-date.

yvanlebras commented 7 years ago

Hi @nekrut , As I will give a RNAseq training session at the end of the week, I have tested your (so beautifull) tuto! Here are some comments, I hope some are of interest...

Moreover, I'm not sure about the strands of the reads compared to the Refseq genes... Obiously, using Trackster, if I don't make a mistake, it appears that we have, by pairs, first (left) sens reads then (right) antisens reads... and the Hoxb13 RefSeq gene is oriented from left to right. So this seems to indicate a FR library type, not a RF one? When we look at the original article, they also proposed FR:

"We used Cuffdiff2 (Trapnell et al. 2013) to identify DE genes, using the following options: dispersion-method=per-condition, library-type= fr-firststrand, max-bundle-frags = 20000000, min-reps-for-js-test =2, -b for bias correction, and –M to mask globin transcripts."

Here are my History (https://usegalaxy.org/u/ylebras/h/rnaseq-training--finding-and-quantifying-new-transcripts-rnaseq) and the related Trackster visu: https://usegalaxy.org/u/ylebras/v/rnaseq-tuto-de

nekrut commented 7 years ago

@yvanlebras Thank you! I'm about to make another pass (once @davebx merges gffcompare in) and I'll fix things you have noticed.

dannon commented 6 years ago

(Group 4) Would be nice to use training-materials repo for all content management to avoid duplication, but publish to galaxyproject.org. Using metadata to tag requirements and generate indexes for 'tutorials that work on main', etc.

bgruening commented 6 years ago

Galaxy-training should have it's own website, hub should embed it - or is the search enough? Summary page in the Hub linking out to the training?

dannon commented 6 years ago

training.galaxyproject.org (proposed) or galaxyproject.org/tutorials (current) doesn't make a difference to me, as long as we render it all with the same infrastructure/processes so we can share efforts there and look&feel.

bebatut commented 6 years ago

What is the status for this issue?

hexylena commented 5 years ago

This site is now available at training.galaxyproject.org

Per @nekrut's original comment:

  • make sure usegalaxy.org tool set is in good shape (tutorials can be run on main successfully)
  • move them from hub into this site once @dannon has completed syncing its styling with hub
  • generalize them by creating docker flavours and cloud-based versions

We are testing at EU and have offered to do this for main/au from our jenkins job, but I think @martenson is in process of doing it himself. Docker flavours are a work in progress.

I think it has settled on all content here, and high quality + running across usegalaxy.* servers :)

hexylena commented 4 years ago

This is mostly all complete with the usegalaxy.* efforts and our workflow testing. Thanks everyone!