Open jules32 opened 9 months ago
Some comments that might be helpful re the "fledging" step. I don't use the Openscapes 2i2c hub very often except when helping with teaching for Openscapes. But I use the Docker images for the Openscapes hub all the time. If I want to teach the Earthdata Cloud Book content, then I want to use the Openscapes image. It saves me so much work to do that and I can use those images to fire up a development environment in many different ways.
So as you think about "fledging" people, I would think about the importance of the (docker) images (corn and py-rocket). Having those, which you need any way, is a huge resource for those who are "fledging". Those images (and the files to make them) are provider agnostic. For example, coiled.io is certainly not required when one wants to fledge from Openscapes. I think Carl's work on how to use one image (or files to create the image) as the base for entry to spinning up compute platforms in different ways is a great example.
But maintaining images with the idea that others will use them does require a bit more effort. It doesn't mean that you create the image as a "product" like the rocker project does. But it does mean a little more effort at clean image code (files) and some documentation. Kind of like code that one writes to "get the job done" without expectation that anyone else will look at it versus code that you plan for others to look at and re-use.
Something quite unique is the py-rocket image actually. That gets both a Python geospatial and R geospatial environment. Perhaps a kind of big image, but it is quite powerful to be able to jump back and forth between those environments or combine them in Quarto.
/cc @cboettig as well, particularly in reference to the images that @eeholmes is talking about. I've been working with him a little on improving the python support in upstream rocker (https://github.com/rocker-org/rocker-versioned2/pull/718 and friends). Would be useful to see what more we can do to improve that :)
A few areas I'd like to explore
There isn't really a stated policy on how the home directory is to be used, and due to technical limitations it's really hard to actually limit the amount of space people use (although it is possible to monitor it via grafana). I'd love for an explicitly written down home directory usage policy to exist, and find ways to implement it as well.
https://github.com/berkeley-dsep-infra/datahub/blob/staging/docs/policy/storage-retention.rst is an example of one for the UC Berkeley JupyterHubs I used to maintain. It is specifically fitted for their use case, so I don't recommend adopting it straight up. But can act as an inspiration.
This requires engineering work to switch from using a GitHub OAuth app to a GitHub App (see the difference). This requires some amount of engineering work, and will need to be prioritized appropriately.
This to me is the most exciting and important concept to develop further strategically. I don't have any active thoughts right now, but will think about it some more.
Another idea!
I would love to understand how people 'apply' to get access to the hub. This is something the pangeo project has struggled with as well (https://discourse.pangeo.io/t/unable-to-sign-up-for-pangeo/3974), so it would be useful to at the least describe this process better (I know a google sheet is involved somewhere) and see if we can build a community based solution that works for multiple people
I am managing one 2i2c hub and also multiple homegrown hubs.
This has been really hard and I am still pretty much unable to set up shared storage and persistent user storage the way I want. The whole pvc and pv and setting up the storage side on the provider side is complicated and mysterious. I managed to get help from Azure for setting up a shared drive but that was 2 hours of 1:1 help. I haven't been able to get any help since.
I quickly discovered that storage costs were going to take up the lions share of costs. Right now on Azure my 100Gb of default user persistent storage is $8 a month per user -- even if they have almost nothing there. Meh. I am doing something wrong. But I think users don't actually need persistent cloud storage in the hub. I feel like we could brainstorm ideas that would achieve the effect of persistent storage wo paying a cloud provider for it.
Ideas
I think here we need to experiment with lots of different ideas. Here are some of the ideas that I am experimenting with. I think hubs are great for building community or for intensive shared development (hackweeks, workshops, classes, teams working intensively on something). Devcontainers allow individuals to use environments developed by others and reduce the set-up barrier/wall. Note many people I work with do not have admin access on their computers. So barrier = brick wall. Installing things is just not going to happen except for a minority.
I quickly discovered that storage costs were going to take up the lions share of costs.
@eeholmes I assume you were using the default PVC provisioner on zero to jupyterhub, which gives each user a persistent storage device regardless of wether they use it or not. This is super expensive and unsustainable. On AWS, we use EFS instead, and this is much cheaper - you only pay for what you use. On Azure, we use Azure File which is much much better, as there's one shared pool for everyone rather than one disk per user. https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421 is the long running upstream issue about providing better documentation for this setup.
@yuvipanda Yeah I am using the default provisioner and yes I discovered that I don't want to use that. Fortunately NOAA is paying me to experiment and learn things so I didn't have to pay for my mistake. Actually it's only $400 or so but I clearly need to figure out a better way. I will look at the link to learn about the better set-up. Unfortunately it's been hard for me to get help from Azure support on this, but I can solve that. I need to have someone above me to tell Azure to pay attention to my requests.
Thanks all for these ideas. How does this sound for a starting agenda?
Purpose: Huddle about current status of NASA Openscapes 2i2c policies and plan next steps based on specific community needs and what has worked in other hubs Outcomes: Have shared goals going forward, some next steps planned Process: 2 hour call that is 1 hr discussion towards a plan, 45 mins hacktime, 15 mins wrapup
Draft Agenda
A great hackday, thank you all! Here are the notes (published view) and the next steps:
Next hackday is March 11
Is there an agenda for today?
Yes! I've just started it; Google Doc linked from our calendar invite. Please add other topics!
Some of the conversation from today's Hackday:
Onboarding to 2i2c
@yuvipanda @jules32 is this meeting happening March 22 sometime?
Yes! April 22, 12-1pm. You should have an invite already. The agenda can be updates and follow ups from what folks have been working on.
@jules32 okay Ill update the board with that info: https://github.com/NASA-Openscapes/2i2cAccessPolicies/issues/7
Next tag up: April 22, 2024 12-1 PT
I've invited @batpad, the new JupyterHub team lead for NASA VEDA / GHG center to this as well.
Hi All, Here's today's light agenda with zoom linfo.
I propose we start off with “pitch your update/topic”, briefly < 3 mins so that everyone with an update/question has a chance to share out loud. Then we’ll decide as a group where to discuss/dig in further.
Please feel free to write notes in the Agenda ahead of time!
Tasha Snow (CryoCloud), Alexy Shiklamonkov (NASA), Ramon Ramirez-linan (Open Science Studio at SMCE), Tess Jaffe (NASA FORNACS initiative, Astro data center that is DAAC equivalent), Wei Ji Leong (DevelopmentSeed, VEDA), Yuvi Panda (Jupyter and 2i2c), Julie Lowndes (NASA Openscapes)
Talking points:
Openscapes, NASA Openscapes project -
Our JupyterHub: purpose: build a community across NASA data center support to have a common set of tutorials and teaching approach for new learners. Teaching workshops early and often over 3+ years; and iterating/improving support (tutorials, teaching, and tech).
Paste in chat: Earthdata Cloud Cookbook https://nasa-openscapes.github.io/earthdata-cloud-cookbook/; 2i2c Access Policies (stablizing this spring then will be in the Cookbook) https://github.com/NASA-Openscapes/2i2cAccessPolicies
What is the next date for this meeting? @ebolch @amfriesz I think you may want to attend
Were there notes taken? I missed as I had turned off my alarm for the 3-day holiday!
Were there notes taken? I missed as I had turned off my alarm for the 3-day holiday!
notes are here
Our next 6-weekly call is Monday, June 3, 3pm ET. It should already be on Bri and Aaron's calendars but happy to readd and share the zoom link
Our next 6-weekly call is Monday, June 3, 3pm ET. It should already be on Bri and Aaron's calendars but happy to readd and share the zoom link
Can we please add Erik Bolch (ebolch@contractor.usgs.gov) I dont think I will be able to make it bc I am serving on a NASA panel that day.
Done!
Today's topic will be around https://github.com/NASA-Openscapes/2i2cAccessPolicies/discussions/11;
Yuvi (zooming with Julie) has temporarily shut down Hub: Set cluster size to 0. Will reduce standard cost.
When we receive credits: email support@2i2c.org and Slack tag Yuvi.
We will investigate: why the increase in Cloud compute? Following May 30 workshop? What workflows/policies do we need from here?
Hello! Today might be a smaller group, but we can still plan to meet with any updates people have. Andy Teucher is on vacation but I can share about his progress on Monthly AWS usage reports and would love to get feedback.
Ending action items from today:
/cc @colliand for last action item :)
A small group today; Andy Teucher, Julie Lowndes, Luis Lopez, Mahsa Jami.
We screenshared and discussed some recent updates (more notes in our doc):
Notes doc
Hello! This is short mini hackday for us to meet about 2i2c access policy for the NASA Openscapes Hub and beyond.
January 29 from 3-5 ET; initial time set with @erinmr @yuvipanda @betolink @jmunroe; and open to others; I'm happy to send a calendar invite if you want to attend.
Background Erin and Yuvi have been working on this throughout the fall; this repository's README describes a lot of this work, which is also linked in the Earthdata Cloud Cookbook. We had many exciting pair-wise conversations about ideas forward at AGU. This hackday is a chance to come together to share and develop a plan to move forward in 2024.
Some starting topics to shape the agenda: