It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
272 stars 21 forks source link

resubmitted streaming jobs #584

Closed arashbm closed 7 months ago

arashbm commented 1 year ago

let's say I submit a job with multiple tasks, and some of those tasks fail. If I try to resubmit the failed tasks, the resubmitted job instantly fails.

$ hq submit --array=0-4 --log=output.log bash some_script.sh
$ # a while later...
$ hq job resubmit 1 --filter=failed,canceled

The server logs:

ERROR Stream connection ended with: Job 2 is not registered for streaming

Expected behaviour, to me at least, is that the new job just appends to the original log.

Kobzol commented 1 year ago

Hi! Thanks for posting this issue. The combination of output streaming and resubmission is not currently implemented at the moment. However, as you can see, the error message is not very good.

As a short term solution, we will improve the error message to let users know that this is not currently supported. In longer term, we plan to implement support for it (we were basically waiting for the first use-case that would need it, seems like you're the first one :) ).

Kobzol commented 7 months ago

This is no longer relevant, because we have since removed resubmit completely.

spirali commented 7 months ago

Let me note that the mechanism that is recommended now for resubmitting do not suffer with this kind error. However, as appending to log is not implemented, resubmit of the task will overwrite the log file.