cms-PdmV / cmsPdmV

CERN CMS McM repository
4 stars 10 forks source link

only add reqmgr after the existing ones #1145

Closed vlimant closed 1 month ago

vlimant commented 1 month ago

as part of #1144 (https://github.com/cms-PdmV/cmsPdmV/issues/1144#issuecomment-2388611886) Request.get_stats https://github.com/cms-PdmV/cmsPdmV/blob/018739d475bc33509f0b64be877445f4fb103d14/mcm/json_layer/request.py#L1867 is not checking (apparently) that only workflows after the one existing in McM record should be added ; to avoid adding workflows from previous submissions of the request.

vlimant commented 1 month ago

meaning one does need a first loop on all_reqmgr_name_list and check the earliest time of status new from request in mcm_reqmgr_name_list. in a second loop of all_reqmgr_name_list, reject any entry that has a time of new earlier than the one above.

lmoureaux commented 1 month ago

We want to keep track of all workflows (there is a UI element for that), so even old ones should be added.

The list of requests is sorted by workflow name after collecting it:

https://github.com/cms-PdmV/cmsPdmV/blob/018739d475bc33509f0b64be877445f4fb103d14/mcm/json_layer/request.py#L1970

Because of the workflow name format, this corresponds to sorting by date and time: pdmvserv_task_GEN-Run3Summer23BPixwmLHEGS-00456__v1_T_240221_003456_7128 (I'm not saying that this is robust; the proper way would be to record the wf name when submitting it; but this is McM).

Do we understand what broke this logic? The request given in the other issue has somehow been fixed, so I can no longer use it to understand the issue.

ggonzr commented 1 month ago

I patched manually that request by including the rejected-archived status in the Stats2 document (details here). To avoid introducing inconsistencies between Stats2 and ReqMgr2, I was planning just to include an extra check to skip these workflows on the McM side. My understanding is it is not possible to move these workflows from normal-archived to rejected-archived in ReqMgr2 which would be the best approach in my opinion. Details for the patch are available here.

vlimant commented 1 month ago

sorting by name/date is one thing. it does not prevent an old workflow from a past submission to be added back in the list. And this has to be avoided, always and ever.

So, sorting by name/date, pick the "date of status new" of the current first reqmgr_name, then reject anything that is earlier than that one. That's the fix for the long run.

For the ones that have had old reqmgr workflow added, we need a) the fix above in place b) patch the mcm data to remove old by hand c) get these inspected again to close.

Your patch @ggonzr might function, but the real solution is to respect workflows chronology as describe above.

lmoureaux commented 1 month ago

it does not prevent an old workflow from a past submission to be added back in the list. And this has to be avoided, always and ever.

Why?

vlimant commented 1 month ago

because they are not relevant anymore to the request.

vlimant commented 1 month ago

past submission are past submission, and could in legitimacy in normal-archived or whichever other status. once we inject a request, and register the reqmgr workflow name that was just created by the submission, nothing prior to this request matters. anything after might.

lmoureaux commented 1 month ago

I think it may be important to see that a request has already been submitted 5 times before trying again. Or to check whether the errors we get after a "fix" and resubmission are the same as the ones in the previous workflow.

vlimant commented 1 month ago

you have the wrong picture here. when we submit something in mcm, whichever happened before is irrelevant. at this point you'll have to trust my experience as submitter and computing operation and put this in.

ggonzr commented 1 month ago

Regarding a patch for taking only the most recent workflow on the McM side. Is every McM request/chained request mapped to a single ReqMgr2 request of type StepChain or it could be translated to several?

vlimant commented 1 month ago

I did not say tale the most recent one. the mapping is not one to one (since there could be many things happening in ops). I described

So, sorting by name/date, pick the "date of status new" of the current first reqmgr_name, then reject anything that is earlier than that one. That's the fix for the long run.

meaning that the time reference is the time of the workflow injected from mcm. anything before is irrelevant, everything after is