Open johnseed opened 1 year ago
@zewditu Can you take a look at this?
Hitting this in the current main as well. Marking it as release blocker
Verifying azure training over the previous release and it seems to be not working as well. Perhaps there's a break change on azure automl service side
@LittleLittleCloud , @johnseed I get successful run today, are you able to try it again?
@LittleLittleCloud , @johnseed I get successful run today, are you able to try it again?
It's still not working, the same error persists.
@johnseed it will be resolved in our next release.
I can confirm that the old version 16.13.9.2235601 does not have this issue, but the old version 16 is unable to select V100 or T4 Compute. Due to the large amount of data, the training times out. These accumulated issues have completely hindered my progress, causing not only financial loss from failed training but also impacting my project, which relies on this. I am currently under tremendous pressure.
@johnseed Really sorry to hear that. We are going to release early next week which will include the fix for this issue.
Also @zewditu, is it the case that @johnseed can overcome this issue by creating a training compute from another region
@johnseed probably your region is not westus2? you can use resources created in westus2 region to unblock you till our next release
@johnseed probably your region is not westus2? you can use resources created in westus2 region to unblock you till our next release
I don't know what it has to do with westus2, as all my compute resources are in eastus. Regardless, I managed to use the T4 GPU, and I hope it works.
@johnseed We just release a newer version with the fix for this issue, please let us know if it fixes this problem
@johnseed We just release a newer version with the fix for this issue, please let us know if it fixes this problem
Thank you for the update! I'll try the new version. I'll let you know if it resolves the issue.
@johnseed We just release a newer version with the fix for this issue, please let us know if it fixes this problem
Can't say it's perfect, but it's working now, thank you very much!
@johnseed what makes you "Can't say it's perfect, but it's working now" ? let us know
Hi, I am having the same problem today
@johnseed what makes you "Can't say it's perfect, but it's working now" ? let us know
@jpcintegral can you share the model builder version you are using? And is the azure training fails with the same error information on azure portal as well.
System Information (please complete the following information):
Describe the bug
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen. The job in Azure ML Studio should have succeeded, and the model builder should then consume the model.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here. The older version model builder seemed fine, so I rolled back to 16.13.9.2235601, and the issue disappeared.