dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Only replicate elements from GQ if data can be injected into WMBS right away #8759

Open amaltaro opened 6 years ago

amaltaro commented 6 years ago

I was wondering about it and decided to start this thread to discuss this issue. Is/was there any good reason to have two layers of work acquisition (GQ to LQ, then LQ to WMBS)?

We always see agents with more workqueue elements acquired than what they could handle, then these elements just sit on the agent waiting for resources (or from time to time waiting for the agent to behave) while it could have been pulled by another agent.

IMO, we should make it a single queue, such that anything acquired from the GQ is immediately converted into WMBS preparation and further job creation. This way we could also get rid of the DataLocationUpdater in the WorkQueueManager thread, which is hard to spot when a block becomes "storageless".

ticoann commented 6 years ago

However, that doesn't prevent from getting too many works in the agent. I think the reason we have 2 steps. It takes a while to create the jobs. Especially pulling the files information dbs, etc.

DataLocationUpdater is needed in case data location changed between the time WQE is created and jobs are not created yet. It is not necessary to have it and it doesn't have much effect location is changed after job is created. (I think the location update has to happen right before the submission if we want to minimize the effect of location change.)

If the problem is getting too many jobs in the agent, I think we have to estimate better on job counting and resource matching.

But I think we can get rid of workqueue_inbox and just have workqueue.

amaltaro commented 3 years ago

Haa, I just happen to find this issue. This is a good candidate issue if we decide to carry on with the WorkQueue refactoring.