OSC / ondemand

Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
https://openondemand.org/
MIT License
278 stars 104 forks source link

Add an app error/success view #3287

Open multimeric opened 8 months ago

multimeric commented 8 months ago

As far as I can tell, OnDemand interactive apps can't customize how to present information when a job has finished. We have the view.html.erb which lets us customize how a running job appears, but once it has completed, the job always appears as "completed" or "failed". It would be nice if there were error.html and success.html or even just some context variables inside view.html so we could configure things a bit more dynamically.

My use case is for situations where you have a result that might be of interest after the job finishes. If the job fails, that might be the output.log, but if it succeeds, there might be an output file that the app could link to so the user can access the result in an easy way.

johrstrom commented 8 months ago

Someone just added completed in #3269 - but as to success or failure, that's a bit tricky because it's an issue of timing. In slurm you need to issue squeue at just the right time, when the job is completed but still accessible through squeue. After a little bit, it disappears, and you need to use sacct which we don't have support for.

multimeric commented 8 months ago

Thanks, that PR should mostly resolve my use case.

Are you saying that jobs will only show up as failed while they're still in the Slurm queue, and then afterwards will just appear as completed once they're out of the queue? I think that's still fine for me, as long as my completed.html.erb can distinguish between the job being failed or succeeded (while in the queue) or unknown (afterwards).

johrstrom commented 8 months ago

Are you saying that jobs will only show up as failed while they're still in the Slurm queue, and then afterwards will just appear as completed once they're out of the queue? I think that's still fine for me, as long as my completed.html.erb can distinguish between the job being failed or succeeded (while in the queue) or unknown (afterwards).

I don't think the job will ever show in OOD as failed. We don't capture exit status. When we show completed status we have no idea if it succeeded or failed, all we know for sure is that it isn't running anymore.