issues
search
GoogleCloudPlatform
/
llm-pipeline-examples
Apache License 2.0
107
stars
26
forks
source link
Add training cluster monitoring, that automatically restart cluster if training doesn't progress.
#5
Closed
kuo-lin
closed
1 year ago
abdallag
commented
1 year ago
Can you please merge from main to allow the updated checks to pass?
Can you please merge from main to allow the updated checks to pass?