dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

increase the number of tasks per workflow #10240

Open jordan-martins opened 3 years ago

jordan-martins commented 3 years ago

Hi,

checking up on this [1], you may see a limitation of 10 in the number of tasks per wf. The idea is to check whether this parameter can be increased.

[Description]: in UL campaigns each request configures a self task. A regular injection of one chained_request means 6 tasks, e.g GEN, SIM, DIGI(Premix), HLT, RECO and MiniAOD. While doing POG requests, it is generally asked from PdmV to keep the same GEN-SIM and produce from this "source" different PU scenarios (which is more or less the same case in the relval submissions). Most of the cases, JME POG for example, there is a requirement of having 5 PU scenarios per same GEN-SIM. This means 5 chained request with the following number of 22 (6+4+4+4+4) tasks.

[Why]: The current situation of 10 tasks per wf means that we can only inject 2 chained_requests (6+4) tops. This is not so efficient, we think, because in order to run the other PU scenarios (the others chained_requests) we would have to wait for the first 2 chained_request to become done, and this can take a while depending to which sample we are dealing with.

[Evaluation]: Do you see (or foresee) a problem if increase the number of tasks? If no, could this number be increased?

Thanks, Jordan

FYI: @amaltaro @todor-ivanov

[1] https://github.com/dmwm/WMCore/blob/7bfa8f00b8faac0a2f9eeeed2a416937e99acf5a/src/python/WMCore/WMSpec/StdSpecs/TaskChain.py#L674

amaltaro commented 3 years ago

Hi @jordan-martins , I have a few questions that will help me understand the use case you describe.

in UL campaigns each request configures a self task.

I'm not sure I understand this statement. What do you mean with a "self task" for an UL campaign?

[Description]: in UL campaigns each request configures a self task. A regular injection of one chained_request means 6 tasks, e.g GEN, SIM, DIGI(Premix), HLT, RECO and MiniAOD. While doing POG requests, it is generally asked from PdmV to keep the same GEN-SIM and produce from this "source" different PU scenarios (which is more or less the same case in the relval submissions). Most of the cases, JME POG for example, there is a requirement of having 5 PU scenarios per same GEN-SIM. This means 5 chained request with the following number of 22 (6+4+4+4+4) tasks.

I assume each of these PU scenarios require a different PU dataset, right? If we were to decompose those in single workflows, we could have something like: 1) baseline: produces GEN-SIM (which gets transferred to places hosting the 5 pileups to be used; or is made available in a couple of locations and read remotely via AAA) 2) 5 workflows, where each of them have 4 tasks (DIGI, HLT, RECO, MiniAOD) and requires 1 pileup dataset

[Evaluation]: Do you see (or foresee) a problem if increase the number of tasks? If no, could this number be increased?

I'm afraid we cannot increase it any further. This is because there are limitations on the relational database (MariaDB) and indexes key prefix cannot be larger than 3072B (which depending on the character encoding, it's likely less than 3072 characters). It's now set to 1250 varchar.

In addition to that, it isn't clear to me whether it's more beneficial to run 1 such large taskchain request or 5 or 6 smaller requests (with can contain different requirements for input/secondary data).

jordan-martins commented 3 years ago

Hi @amaltaro, thanks!

By self task, I meant that we inject as a Task Chain type and this configure one task per request, e.g. one for GEN, one for SIM and so on.

For the PU scenarios, there are only three possible ways in terms of the input PU dataset:

So, ideally we would be looking to an example like 5 chained_requests where we would have 1 NoPu (no input dataset), 1 Premix (1 Neutrino as PU input dataset) and 3 FlatPU (1 MinBias as PU input dataset). And we would only need 1 SIM (not GEN) as source for those chained_requests.

The gain that it is being proposed it is the possibility for processing all at once injection from McM. Also, if we inject all at once, that would be no need to save the SIM, since it would not be needed for later processing and saving space at sites.

Nevertheless, I see that there is a higher constraint to not pursue this, right!?

Thanks, Jordan

vlimant commented 3 years ago

my 2 cents, changing the limitation to the maximum number of task in sequence within the workflow is not a hard one.

klannon commented 3 years ago

Following up on the discussion that just took place during O&C week: As mentioned earlier in this thread, the limitation of 10 is not arbitrary. There is an underlying limitation in our DB. Trying to push past 10 would require some careful investigation and potentially a non-trivial redesign of the code. This is an undertaking that goes beyond what we're prepared to undertake in Q2. (For reference, our Q2 development plan can be found here, and you will see that it includes at least two items (separating RelVal quotas and implementing GPU parameters for workflows) specifically requested by you. We are trying to be as responsive as possible to your requests within the limits of our person-power!)

To move this forward, I propose we have a meeting in the coming weeks during which we discuss candidate development requests from PPD for Q3 2021. Ideally, you would send us a prioritized list of possible features you would like us to prioritize so that we could take a little time to think about the items before we discuss. In the meeting we can then focus on understanding better the needs as well as discussing feasibility and potential alternatives. We will then use this as input to our Q3 planning exercise, which we'll present (as we did for Q2) near the start of Q3 in a Weekly O&C General Meeting.

@jordan-martins, do you want to bring this to the attention of the relevant people from the PPD side?

srimanob commented 3 years ago

An issue in the past is that somehow organizing this kind of mixing is not so efficient, i.e. we slow down also NoPU sample which can be used first. So, technically it is good if we can organize, but we should also deal with performance if what will get is slowing down (but save disk-space, i.e. no SIM is kept).

jordan-martins commented 3 years ago

Hi @klannon, thanks a lot for this landscape on the dev team side. What would be the possible dates/time slots that you foresee for this meeting to happen? Thanks, Jordan

klannon commented 3 years ago

@jordan-martins: If it works for you, may I propose the week of 3-May. I assume that this week, being CMS week, won't work and the following week, I have to administer a big exam in the class I teach. Would that work? Once we agree on the week, we can work to find a good day and time.

justinasr commented 3 years ago

Hello,

After some poking around WMCore I would like to share my findings, if it does not solve this issue, maybe it will be useful to the next person who is going to look into it.

I used Alan's comment as the starting point. The 1250 char limit is set on task column when creating wmbs_workflow table here [1]. There are other 1250 values in that file, but I believe this is the important one?

I was interested on how exactly these task values look like, so I added a print statement here [2] to print workflow's name and task and their lengths when it is created. Output looked like this:

...
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/Skims/LogCollectForSkims (98)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask5 (81)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask5/myTask5MergewriteSkim1 (104)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask5/myTask5MergewriteSkim1/myTask5writeSkim1MergeLogCollect (137)
...

I was using slightly modified version of this [3] test when I was poking around. It seems that task is a path-like value made from workflow name and task names and it's length depends on how far "down the taskchain" the task is. Simplified example can be "/WorkflowName/GEN/SIM/DIGI/RECO/MiniAOD/NanoAOD", of course it has some intermediate "Merge" parts, but you get the idea. So the 1250 character limit restricts how "deep" the workflow can be, not how "wide". In the test run above I tried TaskChain with 20 tasks - DIGI, RECO and then 16 tasks that used RECO as input. Part of the output:

...
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO (73)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask5 (81)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask6 (81)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask7 (81)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask8 (81)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask9 (81)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask10 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask11 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask12 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask13 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask14 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask15 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask16 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask17 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask18 (82)
Workflow name YankingTheChain (15), task /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteRECO/myTask19 (82)
...

As you can see, the TaskChain got up to 16 "wide", but never got beyond 3 "deep" and the longest tasks were 140 characters long (something like /YankingTheChain/DigiHLT/DigiHLTMergewriteRAWDIGI/Reco/RecoMergewriteALCA/ALCAReco/ALCARecoMergewriteALCA2/ALCARecowriteALCA2MergeLogCollect). I tried running with 1000 tasks in the TaskChain and (even though it took sooome time...) the database never threw any errors, because all tasks stayed 3 "deep".

The original issue is exactly this, 10 "deep" is plenty deep enough for PdmV, I think we have 6 "deep" at most? On the other hand, we want to get more "wide". Not to such extremes as 1000 tasks in a TaskChain, but it should be safe to raise the limit to 20 or 30 tasks per TaskChain. However, I still support checking for "10 tasks", but in a different way - checking for max "depth". My proposal is to enhance the [4] check with something like this:

# Limit absolute number of tasks
if numTasks > 30:
    msg = "Workflow exceeds the maximum allowed number of tasks. "
    msg += "Limited to up to 30 tasks, found %s tasks." % numTasks
    self.raiseValidationException(msg)

# Limit depth of tasks
for i in range(1, numTasks + 1):
    taskNumber = "Task%s" % i
    task = schema[taskNumber]
    depth = 1
    while task.get('InputTask'):
        # Find input task by name
        for j in range(1, i):
            inputTaskNumber = "Task%s" % j
            if task['InputTask'] == schema[inputTaskNumber]['TaskName']:
                task = schema[inputTaskNumber]
                break
        else:
            msg = "Cannot find %s's input task %s" % (task['TaskName'], task['InputTask'])
            self.raiseValidationException(msg)

        depth += 1

    if depth > 10:
        msg = "Workflow exceeds the maximum allowed number of subsequent tasks. "
        msg += "Limited to up to 10 tasks, found %s tasks." % depth
        self.raiseValidationException(msg)

Most likely this can be implemented in a nicer way or/and in a more convenient place, but you get the idea.

EDIT: I would like someone with more WMCore knowledge to say whether this makes sense and I should make a PR or there are some flaws in my proposal.

[1] https://github.com/dmwm/WMCore/blob/2cb47779ae8de99299724660be9d01f5e2c1547d/src/python/WMCore/WMBS/CreateWMBSBase.py#L173

[2] https://github.com/dmwm/WMCore/blob/2cb47779ae8de99299724660be9d01f5e2c1547d/src/python/WMCore/WMBS/Workflow.py#L40-L50

[3] https://github.com/dmwm/WMCore/blob/2cb47779ae8de99299724660be9d01f5e2c1547d/test/python/WMCore_t/WMSpec_t/StdSpecs_t/TaskChain_t.py#L754

[4] https://github.com/dmwm/WMCore/blob/2cb47779ae8de99299724660be9d01f5e2c1547d/src/python/WMCore/WMSpec/StdSpecs/TaskChain.py#L677-L680

jordan-martins commented 3 years ago

This is very nice, Justinas!!! In a generic way:

amaltaro commented 3 years ago

@justinasr Hi Justinas, sorry for the belated feedback on your great investigation. I still have to carefully read and think about what you're reporting here, but as a first look, having two different validations for TaskChain depth and width looks reasonable to me. We still need to touch base on the absolute number of tasks in a TaskChain though. If @jordan-martins or you know what we should expect this year and Run3, that would be a great start.

The WMCore team is still focused on the python3 migration, which came to a critical phase. I believe someone in WMCore should be able to work on this feature in ~2 weeks from now, thus targeting a production deployment in July (with some luck, still in June). Sorry for this delay, python3 migration/dependencies are more convoluted than what we expected!

amaltaro commented 3 years ago

@jordan-martins @srimanob @justinasr Hi, I just wanted to check whether this feature request/change is still needed from your side? Note that such request configuration won't be eligible for StepChain conversion (due to constraints of the spec regarding the output module + datatier combination).

Please let me know if you have any concerns or any other updates on this issue. Otherwise, I will start working on it in the next day or so.

haozturk commented 3 years ago

I just wanted to remind our experience w/ taskchains this summer. Here are the two greatest risks:

  1. Having workflows w/ too many tasks will definitely increase our disk usage in the unmerged area. If production is dominated by these kind of workflows, we basically cannot handle it. It would be a very big crisis.
  2. Taskchains are very slow compared to stepchains, thus this will delay the delivery of the outputs and also these requests will occupy production resources more.

In short, if this feature is going to be implemented, it should be used very carefully. It depends on the volume of the injection. Such requests w/ small volume can be handled, but o/w it's a serious issue for production.

jordan-martins commented 3 years ago

Hi @amaltaro, can we put this on hold, please!? The main reason is that previously this made sense for the 20UL campaigns where we could benefit from this scenario. Now, given the concurrent scenario, I do not see a reason why we should split GEN from SIM, for example. In that sense, I would like to wait a little bit for the next big PAGs MC campaign preparation to get a more clear picture of the design. I would foresee a standard procedure, e.g. ([GS+DIGI-RECO+Mini] + Nano).

I would say that we can move to the GPU dev since it is becoming urgent for us.

Thanks a lot, Jordan FYI @justinasr @haozturk

amaltaro commented 3 years ago

@jordan-martins Oi Jordan, it sounds good with me. I'm lowering this issue priority to Medium then and will wait for your feedback before we can start working on it.