2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
BSD 3-Clause "New" or "Revised" License
103 stars 57 forks source link

GESIS Collaboration - Dynamic image building in a JupyterHub #1382

Open choldgraf opened 2 years ago

choldgraf commented 2 years ago


We recently started a collaboration with GESIS with the goal of generalizing, improving, and sustaining the "persistent binderhub" deployment at notebooks.gesis.org.

Here are the major things we'd like to do as part of this collaboration:

Here's a link to the deliverables on the collaboration

Project roles


We'll use this issue to track progress on the collaboration, and to list issues where we have discussion and iteration on more focused parts of the collaboration. Below is a rough list of things to do.

### Updates
- [x] Announce this collaboration publicly in some way
- [ ] https://github.com/2i2c-org/meta/issues/254
- [ ] https://github.com/2i2c-org/infrastructure/issues/2120
- [ ] https://github.com/2i2c-org/infrastructure/issues/2119
- [ ] Have a community meeting for feedback and brainstorms about design and end-product goals
### Cloud cost reminders
- [ ] https://github.com/2i2c-org/meta/issues/547
- [ ] https://github.com/2i2c-org/meta/issues/546

Dedicated board

Ref: https://github.com/orgs/2i2c-org/projects/33.

choldgraf commented 2 years ago

ping to @bitnik @arnim and @mriduls in case any of them are interested in following along, collaborating, or joining in discussions and meetings.

arnim commented 2 years ago

Thank you @choldgraf ;)

sgibson91 commented 2 years ago

Just sharing some slides I presented to a Turing-based project around what I perceive the use cases/target audiences are for JupyterHub and BinderHub and how this dynamic image building in JupyterHub work might affect BinderHub as a separate project. This was presented in order for them to understand how work on internal infrastructure would affect their project and help them make decisions, and is only a reflection of my opinion after having conversations with different folks.

arnim commented 2 years ago

@choldgraf should we do some kick-off?

choldgraf commented 2 years ago

Two quick updates:

Meeting / timing

@arnim yes I believe it's a good idea - we have been working to finish up deploying Pangeo's BinderHub before working on this, and I think we figured out a short-term path forward there last week so hopefully can then shift focus to this project. We will ping here once it's time.

Kernels as a service

In a recent conversation with @jlperla we noted that this work might be related to another feature that many have requested in the Jupyter ecosystem, which is something like "Binder / Jupyter kernels as a service for scalable computation". The idea is that you could define an environment via a Binder-like repository, build it into an image with repo2docker, and then scale computation using that repository's environment via some cloud mechanism. This wouldn't be as rapidly scalable as something like Dask Gateway, but could be a useful way to rapidly / interactively parallelize something in the cloud. Just noting that here in case we find a way to connect it with this work.

arnim commented 2 years ago

The "kernels as a service for scalable computation" idea sounds interesting ;) All the best for the deployment of Pangeo's Hub. A while ago @rabernat raised also an extremely interesting question. Maybe we can indeed move one step closer to a generic backplane for all kinds of scientific infrastructure. Computational replicability is key in economics, geoscience, and the social sciences ;)

yuvipanda commented 2 years ago

Here's a document I wrote almost a year ago now that might help https://hackmd.io/OXhKs4xyQra0KgBglegSBQ

choldgraf commented 2 years ago

@sgibson91 and I discussed this one a little bit today as well, and we agree that it might be helpful to spend some of her time acting as community strategic lead w/ JupyterHub to help steward discussions and feedback amongst stakeholders for this work (with technical vision / ideas / etc coming from others in the JupyterHub ecosystem).

This might be a way to help make progress on this issue and also experiment with ways that the Community Strategic Lead can start improving team processes for discussion and feedback-gathering.

choldgraf commented 2 years ago

Update: Damian serving in a PM role

I spoke with @damianavila today and we agreed that this project will make more rapid progress if we can assign somebody to serve in a project manager style role to help us keep track of plans, conversations, timelines, and deliverables.

@damianavila said he would be willing to serve in this role and he will reduce his time on the PyData theme in order to grow the capacity to work here.

So our next steps are to:

arnim commented 2 years ago

This is great news. Thank you @damianavila for helping with this 🎉

damianavila commented 2 years ago

@arnim, how does your calendar looks-like for a meeting next week? I can send a when2meet link for next week if you are available. 2i2c eng resources should be available to meet next week as well.

arnim commented 2 years ago

how does your calendar looks-like for a meeting next week?

Do you have a preferred day? Usually around this time would work for me well :)

arnim commented 2 years ago

@damianavila Typically somewhere between 12:00 UTC and 23:59 UTC should work

damianavila commented 2 years ago

OK, I have create a when2meet event so we can find a slot that works for everyone involved in this conversation: https://www.when2meet.com/?16135851-mYuiB

@consideRatio @yuvipanda @sgibson91, I would appreciate it if you can join this meeting. Please check the link and drop your availability. Thanks!

arnim commented 2 years ago

@damianavila Should we say Thursday, 21th July, 2022 at 17:15 UTC (or your timezone)?

Video - Jitsi: https://meet.jit.si/DynamicImageBuilding

damianavila commented 2 years ago

@armin, I was finalizing the survey of availability and I was going to propose Wed 20th at 17 UTC, instead. Can you make it? Btw, I can offer a 2i2c zoom room in the invite I am going to send if Zoom is OK for you.

arnim commented 2 years ago

That's fine as well :)

damianavila commented 2 years ago

Can you tell me the email to use in the invite? I will post the details here as well in case you do not want to share your email here.

arnim commented 2 years ago

arnim dot bleier at gmail ...

damianavila commented 2 years ago

Thank you (some people use dedicated calendars for meeting invites, this is why I explicitly asked). Invitation sent! cc @2i2c-org/tech-team

damianavila commented 2 years ago

Details. When: Wed 20th 17 UTC Where: 2i2c zoom room Agenda (dropping some points that we can change if we want/need to):

  1. Quick presentation
  2. State of the problem
  3. Exploration/design phase
  4. Implementation
  5. Provision
  6. Status updates
arnim commented 2 years ago

Can we send @MridulS an invite in case he wants to join?

damianavila commented 2 years ago

Sure, @MridulS do you have any email preference for the invite?

MridulS commented 2 years ago

seth dot mridul [at] gmail dot com

damianavila commented 2 years ago

Invitation sent!

arnim commented 2 years ago

@damianavila do we need a passcode 4 zoom?

damianavila commented 2 years ago

The passcode is available in the invitation and the meeting notes.

arnim commented 1 year ago

Hi @consideRatio,

you mentioned last time that you were working on a similar proposal. Do you think you would have some time to chat? I would love to understand this better :)

damianavila commented 1 year ago

@arnim, curious to understand if you are trying to get additional context about @consideRatio's previous experience with BinderHub-like and/or JHub-like deployments or if you want to understand better the general idea we talked about (proposed) in the meeting.

Btw, I am planning to get the team together this week to start working in depth on the technical proposal (and some experimentation to support that technical exploration). After a few discussions with the team members, I should be able give you a sense of the timeline we are talking about.

Are you actively participating in the 2i2c Slack? Do you frequently check it? I would like to use some spaces there for the specific discussions. I would also like that place to be a medium where you can quickly reach out to me (and where I can ping you as well).

arnim commented 1 year ago

Are you actively participating in the 2i2c Slack?

Not at all, but if you send me an infinite, I would come by from time to time :)

Generally speaking, I'm still fully trying to understand the proposal. I'm in summer school this week, but if you have some time @damianavila, maybe you help me to understand it a bit better. It's probably just me being slow.

damianavila commented 1 year ago

Not at all, but if you send me an infinite, I would come by from time to time :)

Sure thing, I will send you an invite soon.

Generally speaking, I'm still fully trying to understand the proposal. I'm in summer school this week, but if you have some time @damianavila, maybe you help me to understand it a bit better. It's probably just me being slow.

No problem at all! Given that you are in summer school this week, maybe we can meet next one? Let's coordinate via Slack after you join the 2i2c space.

arnim commented 1 year ago

Thank you @damianavila :+1:

damianavila commented 1 year ago

@arnim, I sent you an invite to our Slack space.

arnim commented 1 year ago

THX - just sent my hello message :)

damianavila commented 1 year ago

Some updates:

We are starting with an exploratory phase consisting of:

  1. Series of technical meetings
  2. MVP to validate the ideas coming from those meetings
  3. MVP presentation in front of stakeholders (feedback loop)
  4. Iteration

Next step:

damianavila commented 1 year ago

After the first meeting, we have some next steps briefly described here: https://github.com/2i2c-org/infrastructure/issues/1577#issuecomment-1205891914

arnim commented 1 year ago

Hi @damianavila, I thought that you would ping me first when you have some time to chat. I'm not sure we agreed to build on tljh in our first meeting.

sgibson91 commented 1 year ago

We are only deploying tljh-repo2docker as a means to perform user experience research to inform what we should build (but won't be the sole source of guidance) for, most likely, the z2jh helm chart.

arnim commented 1 year ago

I'm also certainly interested in trying tljh-r2d and have to confess to my shame that I haven't done so yet :) Already subscribed to 2i2c-org/infrastructure/issues/1596

damianavila commented 1 year ago

Hi @damianavila, I thought that you would ping me first when you have some time to chat. I'm not sure we agreed to build on tljh in our first meeting.

For future readers, there was a conversation with @arnim in the specific binderhub-jupyterhub 2i2c Slack channel that further clarifies the context and the explanation @sgibson91 shared above: https://2i2c.slack.com/archives/C03RLNFM43F/p1659687870338259.

arnim commented 1 year ago

Hi 👋 I'm back to the office this month. Should we have a small meeting, maybe next week? Let me know if there is anything I can do.

arnim commented 1 year ago

We should also consider if the proposed new architecture allows for things to be used such as dynamic repository credentials (jupyterhub/binderhub/pull/1169) should they land in BHub or would this then better be integrated on the JHub side.

damianavila commented 1 year ago

I think https://infrastructure.2i2c.org/en/latest/howto/features/github.html might help with this ⬆️. There is a blog post from @yuvipanda at https://blog.jupyter.org/securely-pushing-to-github-from-a-jupyterhub-3ee42dfdc54f.

arnim commented 1 year ago

Added a short link to the Jupyter community forum post on the Persistent BinderHub. The link includes a brief description of the deliverables for the collaboration, so that readers can form an idea of what to expect.

choldgraf commented 1 year ago

Hey all - I've added a "project roles" section to the top comment here and tagged folks that I think are dedicating their time to the project. Please edit if I got any of that wrong!

damianavila commented 11 months ago

The last available update of the plan lives at https://github.com/2i2c-org/binderhub-service/issues/27.