Open SpicySyntax opened 4 years ago
I also have a SageMaker Notebook stuck on pending status for more tan 3 hours knows. I don't have any copy of the code inside the instance .... Do you know how can I get the code ?
This might be related to #207, where the root cause was the notebook ec2 instance wasn't available (an ml.p2.xlarge in their case). However, the Pending status resolved after an undisclosed amount of time for them.
It happened with me, the only advice is to wait and watch. I also searched for solution after 3 minute it resolved in my case. Actually, first check the region (is it correct or not? , of cause correct region give you access and visibility of notebook) . sometime the response time is longer due to network issue. As an example... in sagemaker log each work happens within seconds but you will informed after more than minute. Why? just a response time. Please check it again, May be it will resolved right now. Thank you.
Just had out ml.t2.instance start after being stuck in the pending state for just under 2 hours. It seems that this should be a very easy problem to mitigate, hope Sagemaker releases a feature to force stop when in this state.
Did anyone find the secret on this one? Having this problem with sagemaker studio apps
having the same "stopping" behaviour on sagemaker notebook instance for almost 20hrs now :-( Any way to revive it?
Hi, did anyone ever resolve this? I have the same issue. This happens intermittently. I have a try, except, else, finally code block, where finally has
finally:
def get_notebook_name():
log_path = '/opt/ml/metadata/resource-metadata.json'
with open(log_path, 'r') as logs:
_logs = json.load(logs)
return _logs['ResourceName']
client = boto3.client('sagemaker')
client.stop_notebook_instance(NotebookInstanceName=get_notebook_name())
Most of the time, the notebook gets killed. But sometimes randomly, it wont get killed. When i look at the logs, it says
An error occurred (ValidationException) when calling the StopNotebookInstance operation: Status (Pending) not in ([InService])
I'm unable to find a resolution for this.
I am working with AWS sagemaker to automate model training in regulated and semi regulated environments. After making some changes today, I broke the instances (I needed to fix IAM roles, which I have since fixed). However, my notebooks instances go stuck at the Pending and Stopping phase respectively. I have waited a few hours and nothing has changed.
I have tried all of the commands below:
Unfortunately I am not able to get these to change state.
For Stopping State:
For Pending State:
How can I stop and delete these instances so I can deploy the fixed versions?