2i2c-org / team-compass

Organizational strategy, structure, policy, and practices across 2i2c.
https://compass.2i2c.org
4 stars 13 forks source link

[Discuss] Provide community guidance and deeper support for some communities #297

Closed choldgraf closed 2 years ago

choldgraf commented 2 years ago

Description

We are defining a support steward role (https://github.com/2i2c-org/team-compass/issues/187) to handle all support requests for our hubs. This roughly breaks down to:

Recent conversations have made it clear that some communities might need more support resources than the "community representative" model described above. In particular, what should we do if a community does not have the resources to define a "Community Representative" that can be the "middle layer" for support requests?

Value / benefit

This would allow us to interact with larger or more complex communities, and also offer more clear guidance about what we do and do not offer as a part of the hub service. It might also be an opportunity for us to offer more value to those communities than we currently offer.

Implementation details

We discussed a few ideas for this, here are some that came to mind:

In both cases, we'd need to define a sustainability mechanism for these roles. Both would require more dedicated resources, and would thus be more expensive than the typical hub offerings we have described so far.

Finally, we already have two communities like this:

Perhaps we can use these communities as testing grounds for what a support process could look like. In particular, Pangeo has already awarded us with funds that are beyond what we'd typically charge for managing JupyterHub infrastructure. While part of those funds are meant to go towards development, perhaps a more high-touch support role can be a part of those funds as well.

Tasks to complete

Updates

rabernat commented 2 years ago

Thanks for opening the issue Chris. It's an important discussion.

I just want to clarify a little bit on this point...

what should we do if a community does not have the resources to define a "Community Representative"

With Pangeo, it's really not the case that we don't have resources. We have significant grant funding and a large community of contributors (both volunteer and paid by various projects) to draw upon. For me, it's more of a question of "what is the boundary between Pangeo and 2i2c"? When we initially conceived this collaboration, I imagined that Pangeo and 2i2c would become closely enmeshed with each other. As things have evolved, what has emerged is more of a client relationship, with me as the de-facto liason as the PI on the grants that are funding this hub.

A perfectly fine outcome for me would be for us to designate a community representative outside of 2i2c. There are many people who could fit this role. Recognizing how overextended everyone at 2i2c currently is, that kind of sounds to me like the best choice.

choldgraf commented 2 years ago

Just a quick follow-up to @rabernat's point - something I am trying to figure out here is where to draw the line between development and support. In my mind, the "enmeshing" that Ryan describes above would be some combination of:

When I describe it like that, it does sound to me like a "Community Representative as a service", but I hadn't considered this role in the context of support (e.g., responding to outages and minor requests to update packages and such). To me it feels like there's a large benefit to having a team of people acting as support for Pangeo's hubs, since we avoid single points of failure and can spread the load a bit. However, it's unclear what that team's role should be in the context of this more "Community Representative" style team member...

choldgraf commented 2 years ago

Notes from meeting

@sgibson91 and I had a conversation about this today, and we brainstormed a few ideas around the best way to serve a community like Pangeo. It seems like we are missing role in the story of working with communities like Pangeo. This role would be responsible for things like:

and at a more meta-level (AKA, for 2i2c in general, not just Pangeo):

Notes from conversation with @kirstiejane

I also had a conversation with Kirstie Whitaker. She agreed that the kind of role I've describe above is quite important. She referred me to a new job title at the Turing Institute that might be relevant here, called the Research Application Manager.

Here's the job description of a recent RAM posting: https://cezanneondemand.intervieweb.it/uploads/153/annunci/ResearchApplicationManagerJD.pdf

The basic idea is that this position is a combination of "Product Manager" and "Stakeholder Manager". Their job is to ensure that the research work going on at Turing is having impact through applications that are carried out by stakeholders in the ecosystem.

One possibility is to define something similar for open source infrastructure that we build (e.g. an Open Source Application Manager). The goal of the position would be to ensure that 2i2c is maximizing the impact of its open infrastructure, by bringing best-practices into communities that we serve and sharing knowledge about how to best use the tools that 2i2c provides, and by bringing experiences from the communities back into our development cycles and workflows. The position is sort-of like a translator that bridges the SRE/Development world and the communities we serve.

One option: a team-based approach to Pangeo's collaboration

With all of that in-mind, one potential path forward could be for us to define a team approach to the Pangeo collaboration, rather than a single individual who does all of this work themselves. For example, we could imagine a breakdown of roles like this:

note: I know the above is a lot of raw data! None of this is final, it is meant to spur discussion and brainstorming

rabernat commented 2 years ago

In recent discussions with Chris, I realized that "Pangeo" is probably a pretty anomalous model to use as a "community." The reason is that Pangeo the community does not actually have any money. The entities that hold money that could potentially go to 2i2c are:

The current Pangeo Hub is not actually funded by "the community" at large. It is funded by a specific NSF award to Columbia University, for which I am the PI.

More generally, the concept of an open science community like Pangeo is really pretty new. It would be wise to base our short-term strategy around the existing reality of how science is organized / funded today, rather than some vision for how science should be organized in the hypothetical future.

So now I will offer a different perspective, as a PI of another large new NSF-funded project...


Columbia was recently awarded $25M from NSF for a project call Learning the Earth with AI and Physics. My role is director of data and computation. We need a cloud-based pangeo-style JupyterHub hub for both research and education. Is this "part of Pangeo"? In my head, yes. But on a more concrete level, it is funded by a different grant and has a different set of users and stakeholders.

I proposed the idea of working with 2i2c for our cloud hub using the alpha pricing model Chris shared. They were generally supportive, but the main question people asked was does 2i2c provide training / onboarding for researchers / educators to learn how to use the hub. This would not be language specific (we have both Python and R users), but would be more about cloud-specific stuff, recognizing that most researchers have not used cloud computing before. Stuff like:

We would not need any training in how to do dat science. If 2i2c could offer this sort of minimal but important training / onboarding, it would be much more attractive to this group.

sgibson91 commented 2 years ago
  • Logging on / off
  • Selecting environments (profile-list options)
  • Using nbgitpuller
  • Navigating jupyterlab

I think there are enough resources on all of the above things out there already to collate a short intro guide.

  • Dealing files / data in the cloud

This worries me. My mind explodes every time I watch you do a demo that I feel under-qualified to onboard / train anyone in that.

Which leads to the other question: if 2i2c agrees it can offer this level of onboarding / training, who will give it? Is it the engineering team, or is it some other as-yet-undefined (and likely unstaffed!) role/team?

rabernat commented 2 years ago

We could remove cloud object storage from the scope here if that makes things easier.

if 2i2c agrees it can offer this level of onboarding / training, who will give it?

In our case, it may be enough to develop the training modules, and then hand them off to a TA or other support role within Columbia. Videos and self-paced training could also work.

choldgraf commented 2 years ago

I've had a few more conversations about this, and I'm coming around to the idea that this issue is also related to https://github.com/2i2c-org/meta/issues/256. I spoke with a few people in product management roles, and they often described their work as primarily understanding the needs of the user community around a tool in order to guide development etc. The way that many of them accomplished this was by having a lot more high-touch connections with those communities, it was things like:

On the development side these people would then help represent the community's interests in ideas for new features, prioritization, etc.

This takes me back to the "Infrastructure Application Manager"-style role described in this comment.

Proposal for next steps

I don't know exactly what those two bullet points would look like, but does that seem like a reasonable plan to pursue? If so then I will put this on my plate for our next cycle to try and clarify things further...

sgibson91 commented 2 years ago

I am +1 on this plan @choldgraf

choldgraf commented 2 years ago

I had another conversation with @damianavila about this today, and wanted to write down a quick idea while it was fresh in my head.

We discussed that a high-touch collaboration like Pangeo is really a combination of four services:

I thought this was a nice way to disentangle a few different services that we're providing, and to identify where we can use a team, where we can use individuals, and where there might be different skillsets needed to do things most effectively. For example, I think the final bullet point is more akin to the "Product / impact manager" that is described above.

rabernat commented 2 years ago

I like this enumeration of different services. However, I feel the need to keep pointing out that "Pangeo" / "Pangeo community" is not really an entity that can ship money to 2i2c. It's a very loosely organized and heterogeneous collection of individuals from different institutions and projects. Pangeo literally does not exist from a legal or financial point of view. So from a sustainability / scalability point of view, it doesn't make sense to target "Pangeo" as a customer or client.

The 2i2c customer or client is a discrete funded project, academic department, or research lab with money to spend. In this context, "Pangeo" represents a set of configurations, tools, and practices that the client wants to use. In many cases, the client may even want 2i2c to teach them "how to Pangeo." Right now I am trying to funnel many different projects towards 2i2c as a "Pangeo provider" (perhaps as an alternative to using the MS Planetary Computer or deploying their own stuff via QHub).

I just think it's important that we keep this distinction clear to avoid setting up a business model that caters to a non-existent customer.

damianavila commented 2 years ago

This is a really important distinction... how would be the business model for 2i2c to interact and get funds from collaborations like Pangeo? Should 2i2c try to get funds directly from consumers of that collaboration? That would mean we need a composite model where we are not only serving communities but also individuals?

rabernat commented 2 years ago

Should 2i2c try to get funds directly from consumers of that collaboration?

In general, 2i2c should try to get funds from the people who have money to spend. If we are targeting academic research, that means "PIs"; the people write the grants and make decisions about how awarded funds are spent. For education, the decision makers may be different; individual instructors, department chairs, university IT managers, etc.

I not sure I would characterize these people as "consumers" of Pangeo. Many of them are aware of Pangeo and are asking, "can I get Pangeo for my project / lab / class / department?" In past years I would get emails from these people saying, "can I pay you to run a Pangeo hub for us?" I had to say no because I had no way provide such services. An explicit goal of Pangeo partnering with 2i2c was to develop a turnkey, scalable model that could be responsive to such inquiries.

That would mean we need a composite model where we are not only serving communities but also individuals?

We need to distinguish between the 2i2c "customer" - the person who decides to spend money with 2i2c (i.e. the PI) - and the 2i2c user, who actually logs in to the hubs. In some cases, the PI may never even log in to the hub. They just care about their users getting the resources they need to do their work. The PI will want detailed reports on usage and costs breakdowns. They will also want training / onboarding to make sure the money spent on the hub will have its maximum impact.

The "community" concept is amorphous and without clear boundaries. Open Science Communities are something we are trying to will into existence--they don't really exist yet. To allow communities to grow spontaneously, I think it's very important that membership in a "community" not be tied to access to a particular hub. Access to hubs is determined by who pays for the hub and whether the user is formally affiliated with that project. Participation in a community should be open to anyone. That's why I think it's best for our hubs to be very generic, such that workflows can be run from any hub.

choldgraf commented 2 years ago

I'm going to close this one, as it led to the creation of these issues: