awslabs / benchmark-ai

Anubis (formerly known as Benchmark AI), measures the goodness of machine learning workloads
Apache License 2.0
16 stars 6 forks source link

Reverting #974 #1040

Closed tejaschumbalkar closed 4 years ago

tejaschumbalkar commented 4 years ago

Issue #, if available:

974

Description of changes:

Reverting #974 to support horovod job.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

surajkota commented 4 years ago

Didnt understand the intent of this PR. Watcher is only supported for Single node and SM jobs maybe inference (need to check)

in short this commit is still valid

tejaschumbalkar commented 4 years ago

Didnt understand the intent of this PR. Watcher is only supported for Single node and SM jobs maybe inference (need to check)

in short this commit is still valid

Watcher for MPIJob is currently not implemented #959 and hence we can keep #974 until then.