aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.4k stars 253 forks source link

Feature: start / stop Sagemaker notebook instances #1635

Open codekow opened 1 year ago

codekow commented 1 year ago

Is your feature request related to a problem?

There is no way to start or stop Sagemaker notebook instances via CR.

Describe the solution you'd like

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: NotebookInstance
spec:
  state: [started,stopped]

Like:

aws sagemaker start-notebook-instance

Describe alternatives you've considered

Run a k8s job to run aws cli

dmarcus-wire commented 1 year ago

I have the same issue

jaypipes commented 1 year ago

Calling StartNotebookInstance does not actually start a stopped notebook instance. What it does is create a brand new notebook instance (with the latest ML libraries) and connects to the previously existing ML storage volume (in EFS). Amazon SageMaker's APIs are awkward and don't allow the user to fully control their storage volume or supply the EFS volume ID when calling CreateNotebookInstance , which would be a more elegant and RESTful way of doing this instead of the Start/Stop mess.

Note that there is already a Status.notebookInstanceStatus field. Maybe call this new field Spec.Started and make it a boolean field (ala "Enabled") instead of calling it Spec.State and having started and stopped be the values?

jljaco commented 1 year ago

Note that there is already a Status.notebookInstanceStatus field. Maybe call this new field Spec.Started and make it a boolean field (ala "Enabled") instead of calling it Spec.State and having started and stopped be the values?

I like this solution, with one caveat: let's be sure to limit this approach of controlling the latent state of the notebook via a boolean flag to only controlling whether a notebook can be considered to be running or not; let's not try to also control for other states like Pending, Stopping, Failed, Deleting, and Updating.

Also, consider calling this flag something like Spec.Running to avoid ambiguity with the Started value of NotebookInstanceState

surajkota commented 1 year ago

Starting and stopping a notebook instance is similar to start/stop of an EC2 instance and was considered as a data plane operation. Such operations have not been supported in ACK controllers before. Are we considering changing this tenant?

Happy to discuss this in more detail so we look into a generic solution for all services and maybe other API operations like retry

@codekow @dmarcus-wire Can this be implemented in the higher level application or service?

Please let us know if this is a blocking feature and if there specific timelines you are looking to use this. Thank you

jaypipes commented 1 year ago

@surakota While that is true for most things, because SageMaker doesn't have a way of updating notebook instances, I figured we could take out an update code path using start/stop.

Thoughts?

Best, -jay

On Tue, Jan 24, 2023, 3:25 PM Suraj Kota @.***> wrote:

Starting and stopping a notebook instance is similar to start/stop of an EC2 instance and was considered as a data plane operation. Such operations have not been supported in ACK controllers before. Are we considering changing this tenant?

Happy to discuss this in more detail so we look into a generic solution for all services and maybe other API operations like retry

@codekow https://github.com/codekow @dmarcus-wire https://github.com/dmarcus-wire Can this be implemented in the higher level application or service?

Please let us know if this is a blocking feature and if there specific timelines you are looking to use this. Thank you

— Reply to this email directly, view it on GitHub https://github.com/aws-controllers-k8s/community/issues/1635#issuecomment-1402591912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACAWP6ZOB3JO4IDL5YJYZ3WUA3E3ANCNFSM6AAAAAAT7HK7DE . You are receiving this because you commented.Message ID: @.***>

codekow commented 1 year ago

@surajkota The idea is to declaratively set the state of a NotebookInstance.

If the aws cli can change the state, the assumption is that the k8s controller could do it also.

ack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

codekow commented 1 year ago

/bump

ack-bot commented 10 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot commented 4 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale