I've been trying to use a coiled Job to run some scheduler benchmarks (https://github.com/quasiben/dask-scheduler-performance/pull/130), but the job seems to be failing before it even starts. Turns out a failing-to-start job is very hard to debug, because there's no way to get logs for it, because it doesn't live long enough to find out its ID from list_jobs.
There are many bigger improvements needed to make the Job/notebook UX more pleasant and productive, but I think these tweaks would at least make it possible for an advanced user to debug things themselves:
Have list_jobs also optionally return non-running jobs (toggle this with a kwarg?)
Have start_job return the ID of the new job
(nice-to-have) add job_logs to the toplevel coiled namespace, instead of requiring a coiled.Cloud instance
(nice-to-have) a print_job_logs method or something, to conveniently print the logs line-by-line for readability
(longer-term nice-to-have) coiled job start, coiled job stop, coiled job list, coiled job logs/coiled job watch in the CLI
I've been trying to use a coiled Job to run some scheduler benchmarks (https://github.com/quasiben/dask-scheduler-performance/pull/130), but the job seems to be failing before it even starts. Turns out a failing-to-start job is very hard to debug, because there's no way to get logs for it, because it doesn't live long enough to find out its ID from
list_jobs
.There are many bigger improvements needed to make the Job/notebook UX more pleasant and productive, but I think these tweaks would at least make it possible for an advanced user to debug things themselves:
list_jobs
also optionally return non-running jobs (toggle this with a kwarg?)start_job
return the ID of the new jobjob_logs
to the toplevelcoiled
namespace, instead of requiring acoiled.Cloud
instanceprint_job_logs
method or something, to conveniently print the logs line-by-line for readabilitycoiled job start
,coiled job stop
,coiled job list
,coiled job logs
/coiled job watch
in the CLI