Kenza-AI / sagify

LLMs and Machine Learning done easily
https://kenza-ai.github.io/sagify/
MIT License
434 stars 69 forks source link

CreateTrainingJob operation: Invalid MaxWaitTimeInSeconds #111

Closed abaspinar closed 4 years ago

abaspinar commented 4 years ago

Hi, Thanks for the project.

During my tests with Sagify on sagify cloud train, I've noticed that when we do not set the --use-spot-instances flag. It fails with the following message.

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: Invalid MaxWaitTimeInSeconds. It is only supported when EnableManagedSpotTraining is set to true

As far as I understood, setting train_max_wait=3600 here causes the problem.

Can we set to None instead by default or am I missing something?

Using Sagify 0.20.4.

ilazakis commented 4 years ago

Hi @abaspinar 👋 thanks for reaching out. This was introduced last week by #110 . Should be an easy fix @pm3310 ^^

pm3310 commented 4 years ago

Hi @abaspinar ,

Good catch. It's fixed now (https://github.com/Kenza-AI/sagify/pull/114)! If you install sagify==0.20.5 should everything be fine :-)

Thanks,

abaspinar commented 4 years ago

Thanks a lot for the quick fix. Really appreciated!!