2i2c-org / team-compass

Organizational strategy, structure, policy, and practices across 2i2c.
https://compass.2i2c.org
4 stars 13 forks source link

Research and discuss challenges around access/security policies of institutional clouds #184

Open choldgraf opened 3 years ago

choldgraf commented 3 years ago

Summary

In the Pangeo cloud deployment (#136 and https://github.com/2i2c-org/pilot-hubs/issues/482), we are running into a lot of headaches because we are running the cloud infrastructure on a project that is controlled by Columbia University, rather than our own project.

This is causing a lot of extra work because:

  1. We need to create Columbia accounts for any people that wish to access the infrastructure
  2. We must abide by Columbia's institutional policies regarding deployments on the cloud

Both of these things suggest that running infrastructure in this was is not a sustainable approach. It takes too much special-casing for each institution. While it may be worth it for Pangeo because of the scope of the collaboration, it won't be worth it for most organizations (or it will be prohibitively expensive for them).

What can we do?

We should research and understand our options for avoiding this complexity in the future. It seems like the easiest approach would be investigating whether it's possible to use university grants that pay for infrastructure on 2i2c projects, rather than having 2i2c access grant infrastructure on the university cloud project. Perhaps @rabernat could brainstorm this with us a bit as well.

Actions

sgibson91 commented 3 years ago

Some notes from a chat I had with Arielle Bennett, project manager of the Tools, Practices and Systems programme at the Turing:

Sarah's take:

rabernat commented 3 years ago

Folks, so sorry for the headaches this is causing!

What we could try to do is move the entire cloud budget from Columbia to 2i2c. It would require rewriting our subaward agreement, but we have to do that anyway.

damianavila commented 3 years ago

Folks, so sorry for the headaches this is causing!

No need to apologize, IMHO, this is part of the process we need to go through to "define" our service.

What we could try to do is move the entire cloud budget from Columbia to 2i2c. It would require rewriting our subaward agreement, but we have to do that anyway.

@sgibson91 and @yuvipanda, since you have been more closely involved in this deployment, what are your thoughts on that proposal?

sgibson91 commented 3 years ago

@sgibson91 and @yuvipanda, since you have been more closely involved in this deployment, what are your thoughts on that proposal?

I think if rewriting the subaward grant mitigates https://github.com/2i2c-org/team-compass/issues/136 and https://github.com/2i2c-org/pilot-hubs/issues/575, then it's worth it

The work on private nodes benefits all our hubs by having more secure nodes without impacting the Jupyter front-end (most likely, I haven't tested it with a hub yet!). And the dynamic backends for terraform work I believe could be generalised further for any storage space on any cloud (as opposed to any bucket in GCP) if that's something we end up needing.

choldgraf commented 3 years ago

Just to echo @damianavila - I think this is a really important learning experience to understand where the pain points will be in working with universities. For example, we really need to find a way to serve university stakeholders without requiring us to create email accounts for each person that does work there. But I worry that this won't be possible if the university wants us to use their cloud accounts (e.g., because they have institutionally-negotiated cloud rates). This kind of thing definitely won't be unique to Columbia, so we should have a good understanding of the challenges here