Closed mrocklin closed 1 year ago
I was chatting with folks from TIleDB today. They said that their current approach is to ask users for AWS access keys and credentials and store them. Docs here
This seemed informal to me, but apparently their users are happy with the experience, and it's also fairly accessible for folks who are less technical.
@marcosmoyano it might be worth thinking about how we might ask for AWS credentials, store them securely, and then use them when creating aiobotocore sessions. This seems similar to the multi-region work.
One other thing to note:
For telemetry I think that the scheduler reaches out to Coiled, so if the scheduler has outbound network access I would guess that this is ok.
For proxying though yeah, that makes sense. I guess this creates the question, are companies comfortable having publicly accessible network addresses, given that they're secured through TLS.
it might be worth thinking about how we might ask for AWS credentials, store them securely, and then use them when creating aiobotocore sessions. This seems similar to the multi-region work.
On the surface, this seems pretty straight forward. I do share Rami's concern about proxying
For telemetry I think that the scheduler reaches out to Coiled, so if the scheduler has outbound network access I would guess that this is ok.
Yep, basically fine, although if the scheduler <-> Coiled communication is happening over the public Internet rather than within our VPC we might want to do a little more with custom TLS certs to ensure that communication is secure.
For proxying though yeah, that makes sense. I guess this creates the question, are companies comfortable having publicly accessible network addresses, given that they're secured through TLS.
Not sure what you mean about TLS? If we're not able to proxy, they'd no longer be served on our domain, so we'd need alternative arrangements for TLS. The other part is that Coiled's access control (e.g. notebooks / dashboards only visible by creator) wouldn't work without our proxying.
We could do things with DNS and / or install a proxy microservice into the user's account to handle the proxying / auth for us -- certainly not insurmountable -- just saying it's another aspect that needs thought.
If we're not able to proxy, they'd no longer be served on our domain, so we'd need alternative arrangements for TLS
I'm suggesting that we continue to proxy, but that we open up the network on the scheduler machine. Hopefully this is ok because communications are secured through TLS.
install a proxy microservice into the user's account to handle the proxying / auth for us -- certainly not insurmountable -- just saying it's another aspect that needs thought.
Yeah, I need to learn more here to understand the options.
I'm sorry if this is a naive question, but why wouldn't we support SAML and/or OAuth so our customers could allow any user that should have access to do so - and it would be up to them to set it up? They could specify who the Coiled "admin" is on their end that can set all this up and manage the telemetry, and then the regular "users" who can run jobs but not do anything else?
@scott-coiled forgive me if I'm misunderstanding you, but I think you're thinking of things the wrong way round. For us to act on behalf of a customer to run compute in their account it's not on us to grant permissions, but to support whichever method(s) the cloud platforms use?
yeah, I was originally thinking I was, but then I decided I wasn't. If we supported SAML/OAuth, then the customer could add Coiled as an application they access with their IdP and set the rules as to who can access it and who has what rights in that application. If the customer is using AWS and leverages Cognito for example, then they would setup access to Coiled via Cognito. Customers might also be using 3rd part IdP's like Azure, ADFS, Ping, Okta, etc.
Does this make sense, or do I really have this backwards?
@scott-coiled I still think you're approaching this wrong. Correct me if I'm wrong, but what you're suggesting is:
Is that accurate?
@necaris I'm not thinking AD - it doesn't support federation. I'm thinking ADFS, or Azure, or an IdP like Okta or Ping. Those systems usually "ingest" user data from AD or LDAP, and then the rules about what users can access is managed via the IdP.
@scott-coiled Sure, AD was just an example, but I'm glad to know I'd understood you correctly. Unfortunately that's a slightly orthogonal question to the one we're asking here.
Currently, assuming Coiled has a customer FooCorp with an employee Alice:
My understanding of this issue is that we want:
In this formulation, it seems to me you're asking about how Alice signs in to Coiled, and I'm concerned about how we get / manage / correctly deploy FooCorp's credentials, and how we can do so not just for AWS, but for other cloud providers (most notably Azure) as well.
@necaris - ok, I think we are actually thinking about the same problem. But just to be sure, let me add a few thoughts:
I see two different types of corporate use cases. In use Case #1, FooCorp is fine running everything in Coiled's infrastructure. In that case, we have total control over user Authentication, so we can support GitHub, Google, etc. All is good, nothing to see here.
In use Case #2 (which I think will be much more typical), FooCorp wants to manage access to Coiled using their own credentials and existing AuthN framework. In this scenario, it would be reasonable to ask FooCorp to add a new user (Coiled), that has certain permission/access. Short of that, we'd be required to use all the existing user credentials for the Data Scientists and Engineers at FooCorp (Alice++)
My thinking here is that we should support the "native" AuthN for the cloud platforms that we will support. As this may be in use in some cases. But larger companies will have even more sophisticated AuthN strategy, and this is where I was saying supporting SAML/OAuth would be valuable. Basically, I envision us having documentation something like this - https://docs.jamf.com/jamf-connect/1.19.2/administrator-guide/Integrating_with_an_Identity_Provider.html.
Does this make sense?
We now launch in user accounts!
Currently the deployment of beta.coiled.io launches resources in Coiled Inc.'s AWS account. When companies come to us asking for more security we say "Sure, we'll deploy the Coiled infrastructure in your account. Here is a tarball, a terraform script, and @necaris".
However, we could also launch resources in the user's account if they gave us sufficient permissions to do so. In principle the user could construct an IAM role that let Coiled create ECS tasks, log groups, and so on, and we would use that role on the user's behalf whenever they wanted to launch resources. We could do the same with Azure/ACI whenever that comes online.
However, there are some open questions here and it would be good to get some feedback from advanced users: