galaxyproject / galaxy-hub

Galaxy Community Hub
https://galaxyproject.org/
Other
99 stars 282 forks source link

Integrate Cloud services, VMs, and Containers into Public Galaxy Servers infrastructure #427

Closed tnabtaf closed 6 years ago

tnabtaf commented 6 years ago

This is a proposal. Feedback is encouraged.

I'd like to update the infrastructure we've been using for the public Galaxy Servers list to include cloud based services, virtual machines, and Docker containers. Right now, only public servers are handled well. There are a couple of ways to do this. Here's a first proposal.

It would be trivial to switch to listing them on a parallel page instead.

I like this idea because it much better addresses the different ways that people can run Galaxy without having to install their own.

I'd combine this work with other pending work.

Thoughts?

tnabtaf commented 6 years ago

@dannon @martenson @erasche: ping.

hexylena commented 6 years ago

Sounds fine to me!

dannon commented 6 years ago

@tnabtaf This sounds good to me, let me know if I can help.

tnabtaf commented 6 years ago

Thinking out loud about how and why to do this.

Use Models

Current classification scheme has general, domain, and tool-publishing. Those are all about the contents or purpose of the server. All the servers have the same use model - goto a website, and start using it. This proposal is about adding new use models like "regional cloud", "vm" and "container".

Many Use Models, Many Purposes

Furthermore, a "server" can be available in multiple use modes, such as an online public server, and as a Docker container. I can also imagine a "server" having multiple content/purpose classifications too (well, I can't imagine, but keep it in mind for the future).

How to list that in the Hub?

Current page shows 3 separate lists, one each for general, domain, and tool-publishing. We could just add an "Availability" column that lists things like "Public sever, Container". That works fine for the current list of servers, but what about services like PL-Grid or things that are only available as containers?

How about:

Server / Service Category Platform The rest of the info
ARGalaxy Domain Public server Immuno stuff
Big Project General Public server, Regional Cloud, Container For example, you can run an instance on Jetstream.
MyFaveG Tool Publishing Container What's in the server?

I can see wanting to sort them alphabetically, or by category, or to say "Just show me the public servers". To do that we might want sortable tables, and we might want to have duplicate entries so the sort works:

Server / Service Category Platform The rest of the info
ARGalaxy Domain Public server Immuno stuff
Big Project General Public server For example, you can run an instance on Jetstream.
Big Project General Regional Cloud For example, you can run an instance on Jetstream.
Big Project General Container For example, you can run an instance on Jetstream.
MyFaveG Tool Publishing Container What's in the server?

OK, that looks terrible in a straight alphabetical sort, especially when the 3 "Big Project" entries would all link to the same page. Aha! the current listing also has a links column. The Platform column could be used to link to them.

Server / Service Category Platform The rest of the info
ARGalaxy Domain Public server Immuno stuff
Big Project General Public server For example, you can run an instance on Jetstream.
Big Project General Regional Cloud For example, you can run an instance on Jetstream.
Big Project General Container For example, you can run an instance on Jetstream.
MyFaveG Tool Publishing Container What's in the server?

One option for supporting this is DataTables.

hexylena commented 6 years ago

I can also imagine a "server" having multiple content/purpose classifications too (well, I can't imagine, but keep it in mind for the future).

EU provides hicexplorer.usegalaxy.eu, metagenomics.usegalaxy.eu (among others), and the general overall usegalaxy.eu. I think that would meet "General" + "Domain"? If you wanted an example? They're all technically the same server, your data is the same on each, etc.

+1 for data tables.

tnabtaf commented 6 years ago

Adding link to issue in related repository: Programmatically Generate this list from the Public Galaxy Server List page. Hope to address that issue, while implementing this issue.

tnabtaf commented 6 years ago

Thinking about changing directory structure, now that it is no longer about public servers.

Current directory structure is

src/
  public-galaxy-servers/
    abims/
      index.md
    argalaxy/
      index.md
    biomina/
      index.md
    cistrome/
      index.md
    genouest/
      index.md
    ...
  galaxy-services
    index.md   # everything listed directly in this page.

This could become

src/
  use/
    abims/
      index.md
    argalaxy/
      index.md
    aws/
      index.md    # new, most content might be elsewhere
    biomina/
      index.md
    cistrome/
      index.md
    genap/
      index.md     # formerly under services
    genouest/
      index.md
    gvl/
      index.md    # formerly under services
    ...

98% of current directory names would be preserved, meaning the NGINX redirects would be mostly rule/pattern based.

Are there better options than use for the root directory name?

tnabtaf commented 6 years ago

First stab in now available in the use-directory branch. Look at anything under src/use.

tnabtaf commented 6 years ago

Hi All, I could use some feedback on the use-directory branch before I pursue this too far. Take a look at anything under use.

All the resources have been translated entirely manually so far. I might keep doing that as it forces me to review everything, or I might start to go crazy and automate the port.

martenson commented 6 years ago

Available for review here: https://galaxyproject.org/use/

edit: I like it very much. It still provides the service expected from 'public galaxy servers' page but expands it in a logical way.

tnabtaf commented 6 years ago

Should resume this work next week. In the meantime, I've been adding newly discovered resources to this directory only.

In the initial PR, @frederikcoppens said

I definitely like the pages for each server better (so the newer version for things like https://galaxyproject.org/public-galaxy-servers/abims/)

Having badges for the platforms would be nice and take less place (and less clutter). This is probably more difficult for the scope. But it currently is already better compared to all the duplication in the original version with the links column.

I like the idea, and I suspect I could implement a proposal, given one. It might not be in the first release.

What is/are the aim(s) for this list ? If it aims to help users find an appropriate instance, I'm afraid it's hard for them to do so. Is it feasible to have an interactive table where you can filter ? Can we tag instances so they can be filtered ?

Is there any chance I can convince anyone within the sight of my voice to integrate DataTables into the Hub? That would get us sortable tables. Not sure about filtering

You also mention the galaxy-services: these are not included in the new page, or did I miss this ?

By services I mean things like Jetstream and PL-Grid. See Academic Cloud Services

tnabtaf commented 6 years ago

Hi All,

I've converted all the public server description to the use directory format. See the All Resources section of the use page. Now, I need to add a backlog of VMs, containers, and cloud services. I''m hoping to get some guidance on cloud services.

The current description of cloud services. I think that these services will be a straightforward conversion:

However, I not sure how to describe these services in the new regime:

They are now all intertwined. GVL is a platform that runs on CLIMB, NeCTAR, Jetstream and AWS. Should there even be an entry for GVL? I think we want entries for CLIMB, NeCTAR, Jetstream and AWS because each has a different set of users that can use it. How do we describe CloudLaunch here? Should it have a separate entry or should it be briefly described in the entries for NeCTAR, Jetstream, and AWS?

I can make educated guesses at these questions, but guidance would be welcome.

Ping @afgane @slugger70 @bgruening

afgane commented 6 years ago

The relationship is pretty much exactly as you described it where the GVL is a platform and it is available on different providers: CLIMB, NeCTAR, and AWS (it is not available on Jetstream). CloudLaunch is the service used to start/deploy the GVL on those providers. Different instances of CloudLaunch offer the same service just in different physical locations to make them more responsive (i.e., same concept as multiple usegalaxy servers).

I think the simplest method of describing this to the users is to basically follow that model: list the GVL (gvl.org.au) as the resource which is available on multiple providers (CLIMB, NeCTAR, and AWS) and can be deployed via CloudLaunch (link https://launch.usegalaxy.org/ and https://launch.gvl.org.au/). All this can be a single row in the table with some bullet points to delineate concepts.

tnabtaf commented 6 years ago

@afgane Thanks for taking a look and confirming/correcting.

I'm still torn on whether or not to list GVL at the top. More thought....

For now, I'll investigate what it takes to get DataTables into the Hub.

tnabtaf commented 6 years ago

Thanks to an assist from @martenson, DataTables are now in the Use Directory.

A couple of things, I'm planning to address next:

Combine Platform Columns

The current implementation has 5 platform columns: Server, Comm Cloud, Acad Cloud, Cont, and VM. Thinking about combining Comm Cloud and Acad Cloud, and Cont and VM.

I'm not even sure how useful it is to display this info in columns.

Scope

I'm no longer convinced that (outside of UseGalaxy) that Scope should be a first level organizing characteristic. I think it's handy information to know and will likely leave it in the data and display it on the individual pages. However, I don't know that it deserves it's own section in the directory, or to be listed as a separate column in the other searches. Might not be listed on index at all, or would be listed in a new ....

Tags / Keywords

Everyone who looks at this says

  1. Can I sort this? (now done with DataTables)
  2. Can we support search? (now done with DataTables)
  3. Can we add keywords and keyword search

That last one would be incredibly useful, and it's also the hardest and most tedious to maintain.

The right thing to do would be to use ontology terms for methodologies, and for parts of the tree of life. The quick and dirty thing to do would be to just type stuff in.

If we used ontology terms we could link out to ontology browsers, say the Ontology Lookup Service. Could use EDAM and NCBITAXON. I'll Look into it.

DataTable's search ability lets us search the keywords for free.

Still to come

And I still haven't added the backlog of VMs, containers, and cloud services, or done anything CloudLaunch (but I will)

tnabtaf commented 6 years ago
tnabtaf commented 6 years ago

Just sent this email to the public servers list, the committers list, and the team list:

Feedback Wanted: Migrating Public Servers list to resources directory

Dear Public Server Hosts, Galaxy Committers, and Galaxy Team,

We are in the process of migrating the public Galaxy servers list to a directory of resources for using Galaxy on public servers, clouds, containers, and VMs (and whatever else the future holds). We hope to make this transition during the week of October 22, and I'd like your feedback before then. Please see below.

Thanks,

Dave C

Call for help!

We welcome any and all feedback you may have on the new directory. This includes

1. Updates to the description of your particular resource(s).

Now is a good time to do this. In particular, if your resource has a short summary, consider expanding it to a couple lines of text. This will help people find your resource when they search.

Please send these updates directly to me (or update your resource description directly through Github).

2. Feedback on the directory page.

Should it be broken up into multiple pages? Should the directory look different? Should the lists be presented in a different order? Is the overall structure reasonable?

Please post feedback to the GitHub issue for this change (or email just me and I'll post it there.)

3. Feedback on the layout and fields in individual resource pages

For example, the AWS page, the CLIMB page, the IRProfiler page and the Galaxy-P page. Should these contain different/additional information? Is there a better format?

Please post feedback to the GitHub issue for this change (or email just me and I'll post it there.)

4. Suggestion on keywords

The current setup does not (really) support keywords, but we want it too. The Github issue discusses some options for supporting these. If you have any opinions or suggestions then please add them to the issue.

Unless there is an outcry that "we absolutely have to have this before we announce" then keyword support won't be in the version that gets announced this month.

Thanks again for your help.

Miscellaneous Blathering

Want to know more? Keep reading.

The goal of this update is to make it easy for people to find ways to use Galaxy that are easy. That used to just include publicly accessible servers, but I think it's now easy to use Galaxy on clouds, and with VMs and containers.

This update only partially achieves that goal. It doesn't provide a way to search all the information about the individual resources, nor tell you what tools/resources/genomes are actually on each resource. Adding keyword support will help this situation some.

Got ideas? Opinions? Please add them to the GitHub issue.

During the coming week, I'll create a PR that updates all links in the hub, and adds redirects to the old directory. Thanks for reading this far.

tnabtaf commented 6 years ago

Searching: Google custom search

One way to get better searching of everything on the individual resource pages is to use a Google custom search that only looks at pages in the Use directory. Could add a "Use Now!" tab to the hub's current search page and then just link to that from the Use directory page.

the-x-at commented 6 years ago

Ad "galaxyproject.org/use": One could put each of the tables and their associated sections into separate tabs. In my opinion, this would tidy up the page a bit.

tnabtaf commented 6 years ago

Select pulldowns in the All Resources table

David Kovalic suggests:

It seems to me that the first list contains all of the info on the new resource page and the subsequent lists are duplicate sub-sets of the first.

Any sense to make the first list filterable (i.e. a user can filter to just show the subset of public, academic cloud, commercial cloud or container resources) instead for duplicating these into separate lists?

We sort of have this now, sorting with the Cloud and Deployable columns. That clunky but it's most of this functionality. Still I looked into this, and I think [example 3 on this page](https://datatables.net/reference/api/columns().search()) shows how to do it. However, my JavaScript chops are insufficient to get that too work.

Put each table/section in a separate tab

@the-x-at suggests making each table be a different tab in a single UI element. I quite like this idea, and while it is well beyond my JavaScript skills, I'll give this a try.

tnabtaf commented 6 years ago

A first version of the Put each table/section in a separate tab is now done in the use-tabs branch.

I think it looks a lot better. The one drawback I see is that you can no longer directly link to one of the tables. I'm also thinking about making the All tab be the default.

Any feedback welcome (and any code fixes too - this was all PUG/JavaScript/Bootstrap, none of which I am proficient at).

dannon commented 6 years ago

@tnabtaf I can definitely make anchors link into the tables, if this is something you want.

tnabtaf commented 6 years ago

@dannon I always want them and I am continually frustrated when it isn't supported.

But, right now, I don't think I have anything that tries to link into the table. However, as soon as I type the announcement, I'll want this.

And the Travis build failed. :-(

tnabtaf commented 6 years ago

Merging use-tabs branch into master....

tnabtaf commented 6 years ago

Latest set of changes are in the redirect branch. Will keep accumulating them there until this update is deployed.

tnabtaf commented 6 years ago

Update will be merged/deployed on Monday using PR #449. And I think the update is ready to go, if anyone wants to look at it.

afgane commented 6 years ago

This looks much better now! One question/suggestion - why do Public Servers not have a direct link to the server from the list of servers?

tnabtaf commented 6 years ago

@afgane I don't know. But I'll add it to the public servers list and maybe the other lists too.

Slugger70 commented 6 years ago

@tnabtaf I really like the tabs. It makes it much more accessible.

the-x-at commented 6 years ago

@tnabtaf Thanks for jumping on this so quickly. Looks really great!

dannon commented 6 years ago

fixed in #449