======================================================================
ERROR: test_non_distributed_runs (__main__.TestRun)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/app/trainer/task_test.py", line 34, in test_non_distributed_runs
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
SystemExit
----------------------------------------------------------------------
Ran 1 test in 21.829s
This is problematic for various reasons including jobs on kubflow re-starting thinking the job is failed when it's actually just finished.
Currently all runs that reach the end of main() have an ungraceful system exit. E.g.
Yields
This is problematic for various reasons including jobs on kubflow re-starting thinking the job is failed when it's actually just finished.