awslabs / kubeflow-manifests

KubeFlow on AWS
https://awslabs.github.io/kubeflow-manifests/
Apache License 2.0
164 stars 119 forks source link

Managing Profiles/users and contributors at scale in Kubeflow #492

Open surajkota opened 1 year ago

surajkota commented 1 year ago

Is your feature request related to a problem? Please describe. Customers authenticate users to the Kubeflow platform using Cognito or Cognito integrated with their idP(Okta, AD etc.). These users(data scientist, ML engineer) then need to have a profile/namespace in Kubeflow to work on their ML tasks. Customers need ways to manage profiles and collaborators for user profiles in Kubeflow. This issue is to discuss and propose solutions for managing user profiles

Describe the solution you'd like Few approaches to consider (in no particular order):

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Consider a person leaving the company.

Nice to have: Consider customers who have multiple deployments(dev/prod), they shall be able define their configurations at one place and deploy the same configuration according to the environment

jbgerth commented 1 year ago

Hey, thanks for reaching out to the community. My team and I are also very interested in how to tackle this issue.

In our setup, we have a deployment with about 50 users and multiple projects. Additionally, we have multiple environments i.e. dev and prod for the users as well as sandbox accounts for infrastructure testing.

For authentication, we are using Cognito plus Azure AD. Granting access to Kubeflow is done via AD groups. The difficult part for us was to map the users from AD/Cognito to Kubeflow profiles. In our first attempt, we added the profiles manually in Terraform on user request. This works fine for the number of users, but this means that we always need someone to approve changes. Therefore, we looked for something more automated.

In our current version, we integrated Azure AD with Lambda. Each change of pre-selected Azure enterprise applications triggers an event in Lambda via the SCIM interface. This event is then processed and synced to our Terraform stack on GitHub. From there a GitHub actions pipeline syncs the Profiles to the desired Kubeflow environments.

We decided against the Cognito trigger for Lambda because that would make the initial user experience worse. The trigger would only fire once the user logs in for the first time. This would lead to the user not having a profile during the first login and we feared having a lot of confused users.