issues
search
GoogleCloudPlatform
/
llm-pipeline-examples
Apache License 2.0
107
stars
26
forks
source link
Add training cluster monitoring, that automatically restart cluster if training doesn't progress.
#4
Closed
kuo-lin
closed
1 year ago