NASA-IMPACT / veda-jupyterhub

VEDA JupyterHub technical planning and documentation
1 stars 1 forks source link

Improve discoverability and sharing of notebooks created in VEDA Hub #4

Open batpad opened 6 months ago

batpad commented 6 months ago

We should have a good way for users to publish and share notebooks that they create in VEDA Hub. Users should have a good way to find existing notebooks, searching by datasets that they use, or themes.

Creating this as a broad ticket for now - there's likely a few different implementation pathways here, that would depend on the exact use-cases we want to fulfill, as well as the level of "publicness" we want for these shared notebooks.

@wildintellect will be great to understand from you how this is being done currently (or not), and what we should hope to accomplish from this feature that can inform the approach we take to building it out.

Likely what we want is some UI integration into Jupyter Lab that would let users easily push notebooks to a git repository, and add some metadata, and then build some UI that can consume metadata from the git repository and provide an interface to search and run the notebooks.

This will take a bit of research and thought to spec more precisely, but ticketing to capture the broad sentiment / idea that we do want an easy way for users to share and discover notebooks related to datasets on VEDA.

wildintellect commented 6 months ago

Doesn't exist. Currently relies on users self sharing often via github. MAAP has this same issue around algorithms which are more formal. Note: not all users will want all notebooks shared, should be opt-in.

Some prior art to consider: https://wholetale.org/

j08lue commented 6 months ago

All we have is veda-docs mounted into JupyterHub containers by default, but that is very very bare-bones. We do not even link to the docs anywhere.

So, the connectivity between docs and JupyterHub could definitely be improved - even without new convenience features for adding or editing notebooks, just by integrating the two better, like

  1. Users meet the catalog of existing tutorials (internal or public) when they log into Jupyter Lab (not just the folder in their home dir)
  2. Users can find a link to the docs / tutorials in a prominent place, including contribution instructions.

I think once these tutorial / docs repos are more prominent, people will want to contribute more to them, and then we may need to make that more convenient.

wildintellect commented 6 months ago

@j08lue I don't think this ticket was talking about Docs, I think it's talking about a vibrant community library where notebooks of any kind can be shared. It's more akin to public gists, or putting notebooks in a reproducibility archive like Zenodo or OpenScienceFramework where others might search and discover.

We also don't want contributions to the VEDA Docs beyond VEDAHub specific nuances and references to external docs like Openscapes, unless we plan to follow the Planetary Computer model :thinking: .

colliand commented 5 months ago

Interesting! 2i2c has an award from NASA that might be a nice way to explore some of these ideas. We could:

  1. set up a JupyterHub service with ephemeral home directory service with the same software toolchain used on the VEDA Hub.
  2. build some compelling notebooks that showcase how VEDA allows humanity to understand the Earth System using NASA Data.
  3. generate nbgitpuller links that point to the ephemeral hub using the repo that contains the compelling notebooks.
  4. Share the nbgitpuller links in various channels.
  5. (Bonus?! Optional?!) Build a MyST site around the compelling notebooks with an interactive launcher that launches readers into interactive sessions with the same content using the ephemeral hub.

The ephemeral hub could be set up with small compute resource and made available to the public. Alternatively, the hub could perhaps be set up with large compute resources behind a log in gate controlled by VEDA.

colliand commented 5 months ago

Interesting! 2i2c has an award from NASA that might be a nice way to explore some of these ideas. We could:

  1. set up a JupyterHub service with ephemeral home directory service with the same software toolchain used on the VEDA Hub.
  2. build some compelling notebooks that showcase how VEDA allows humanity to understand the Earth System using NASA Data.
  3. generate nbgitpuller links that point to the ephemeral hub using the repo that contains the compelling notebooks.
  4. Share the nbgitpuller links in various channels.
  5. (Bonus?! Optional?!) Build a MyST site around the compelling notebooks with an interactive launcher that launches readers into interactive sessions with the same content using the ephemeral hub.

The ephemeral hub could be set up with small compute resource and made available to the public. Alternatively, the hub could perhaps be set up with large compute resources behind a log in gate controlled by VEDA.

wildintellect commented 5 months ago

That's still a curated set of notebooks, I think the original intent of this ticket was around user to user sharing that does not require platform based curation.

choldgraf commented 5 months ago

Just noting that I think this use-case might be well-served by functionality like what we're discussing in this MyST issue:

This is currently at the level of the MyST document engine (more at https://mystmd.org) but if it could be brought into JupyterLab via JupyterLab-MyST maybe (?) it could be rendered in JupyterLab as well.

Other thoughts for inspiration:

It's similar to the Jupyter Book gallery (if you could add a "filter by dataset" feature)

https://executablebooks.org/en/latest/gallery

This one is very simple, you just add an entry to this YML file and it pops up there. Very low-complexity :-)

Or the Pangeo gallery (if you removed the hard dependence on BinderHub):

https://gallery.pangeo.io/

You could give users a metadata specification to follow (e.g., topic: climate or dataset: CMIP6), and tell them that if they put their notebooks in a specific folder, it'll be scraped for that metadata and used to populate a "list / gallery of VEDA notebooks" in the docs.

Or for a "hacky but possible right now" approach, you could follow an example like this dynamic "list of running hubs" in the 2i2c infrastructure docs:

https://infrastructure.2i2c.org/reference/hubs/

Which renders a CSV file that is generated at build time from a Python script, and then uses a little JavaScript tool called DataTables to turn it into a dynamic table that you can filter (here's the relevant JS):

https://github.com/2i2c-org/infrastructure/blob/ae899f59c42625839c57055cde88fde0dd1ec883/docs/reference/hubs.md?plain=1#L39-L63