VUIIS / dax

Distributed Automation for XNAT
MIT License
25 stars 24 forks source link

Baxpr queue handling #365

Closed baxpr closed 2 years ago

baxpr commented 2 years ago

@bud42 can you look at this?

Instead of limiting launch based on total number of accre jobs, this will limit based on the number of pending accre jobs. This might help avoid hitting xnat with too many accre job starts at once.

It would also remove the limit on total running jobs, unless we add that in additionally. Not sure what effect that would have, but I think ACCRE's own scheduling will keep that under control just by keeping the pending list full.

If we install this, we would also need to drop the queue_limit setting in the instance dashboard to more like 20-50 instead of 200-500.

baxpr commented 2 years ago

btw I have not tested

bud42 commented 2 years ago

Do jobs always enter the pending phase long enough for this to work? And if so, are we limiting the number of new jobs that launch each time dax runs? Is that what we want?

baxpr commented 2 years ago

Ah, I don't know. Probably not.

The goal would be to limit the number of pending jobs to ~50 or whatever avoids too large of a hit when accre launches them all at once.

I'll add back a limit for total jobs as well, I think

baxpr commented 2 years ago

... in which case, what do you think about a 2 sec sleep between individual job launches as well, in case accre is launching them immediately?

baxpr commented 2 years ago

So overall, dax will launch as many jobs as possible each hour subject to

All three of these settings can be exposed in the instance redcap.

baxpr commented 2 years ago

Could throttle by pending uploads as well, for that matter - no new launches until e.g. pending uploads are <2000

baxpr commented 2 years ago

Testing full dax manager run on ROGERSTEST (rogersbp@hickory)

baxpr commented 2 years ago

Requires new fields main_queuelimit_pending, main_limit_pendinguploads in the instance redcap. We need to document the instances panel and provide an initial data dictionary for it, similar to the project settings info in docs/dax_manager.rst

baxpr commented 2 years ago

... have not implemented a delay yet

baxpr commented 2 years ago

Tested ok for a full build/launch/upload cycle on a single project, a few assessors. Next, test thresholds

baxpr commented 2 years ago

With thresholds set to 1, only 1 job got launched. Would be helpful to report why launching stopped in the log.

baxpr commented 2 years ago

Launch delay is working. @bud42 this is ready for another look. Not sure how it will interact with #369 though

baxpr commented 2 years ago

No. I meant to do that via Template, hang on

bud42 commented 2 years ago

Looks great! Let's merge this prior to #369. Hopefully, git will work it's magic!

baxpr commented 2 years ago

@bud42 see if that should do it?