MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.26k stars 21.43k forks source link

Cannot reboot my compute instance in a "Unusable" state #117352

Closed OrianeEmpower closed 11 months ago

OrianeEmpower commented 11 months ago

Hey all,

I use machine learning studio and my compute instance ran out of memory and went in a "Unusable" state. So I used the terminal of the compute instance to clear space and run "sudo reboot". I probably haven't cleared enough space because it restarted and eventually failed, and went in the unusable state again. So I went back to the terminal and cleared more space, but before I could run "sudo reboot", I had to reload the page and now I cannot access the terminal anymore. I got the following message when trying to access the terminal: "Current terminal is encountering some issues, please switch compute or restart your current compute and retry."

Is there another way to reboot the compute instance?

Basically, I mainly need to access data that I stored in the local files, and I haven't backed it up. If there is a way to get back the data without having to reboot the compute instance, I am interested as well.

Thanks a lot for your help!

Naveenommi-MSFT commented 11 months ago

@OrianeEmpower It would be great if you could add a link to the documentation you are following for these steps? This would help us redirect the issue to the appropriate team. Thanks!

OrianeEmpower commented 11 months ago

Hello! I haven't followed any specific documentation actually because the instructions are directly in the UI. In the "Compute" page where I can see that my compute instance is unusable, there are instructions about how to clear space and reboot the compute instance. I was redirected to the "Notebooks" page, where I usually have access to the terminal of my compute instance. From there, I can delete files and reboot the compute instance following the instructions.

Naveenommi-MSFT commented 11 months ago

@OrianeEmpower

If you are unable to access the terminal through the UI, there are a few other options you can try to reboot the compute instance:

Regarding your data, if you are unable to access the terminal or reboot the compute instance, it may be difficult to retrieve your data. However, you can try contacting Microsoft support for assistance. They may be able to help you recover your data or provide other options for accessing it.

In the future, it's always a good idea to back up your data regularly to avoid data loss in case of unexpected issues like this.

Note: MicrosoftDocs Github issue is mainly related to the document issue. For troubleshooting related queries and issue discussions, I would recommend you to create a thread on the forums - Microsoft Q&A or Stack Overflow for support ticket Once you post your issue on forums, it will have visibility across the community which is a better suited audience for such types of issues.

Naveenommi-MSFT commented 11 months ago

@OrianeEmpower We are going to close this thread, if there are any further questions regarding the documentation, please tag me in your reply and we will be happy to continue the conversation.

Please-close

OrianeEmpower commented 11 months ago

@Naveenommi-MSFT thanks a lot for your reply! Sorry for not posting this on the right place, I'll do it right next time. I continue on this thread for this issue as I started here. I tried both solutions but still cannot reboot my compute instance because:

Restart compute

ml_client.compute.begin_restart('oriane-56GB').wait()


But it resulted in an error: "message": "RestartCompute is not allowed when ComputeInstance disk is full."
Although I did empty space in the disk of my compute instance. If that's not what you were thinking about, can you please link me to any documentation that can be helpful? 

Thanks a lot!
Naveenommi-MSFT commented 10 months ago

@OrianeEmpower

Thank you for your feedback! Please Note, GitHub forum is dedicated for docs related issues. Since this issue isn't directly related to improving our docs, and to gain a better understanding of your issue, I'd recommend working closer with our support team via an Azure support request. Or you can leverage our Q&A forum by posting your issue there so our community, and MVPs can further assist you in troubleshooting this issue or finding potential workarounds. Teams Q&A forum for technical questions about the configuration and administration of Microsoft Teams on Windows. Microsoft Teams Community forum