CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Changing all_errors.json #22

Closed paorozo closed 6 years ago

paorozo commented 7 years ago

We need to find a way of making this json file useful https://cmst2.web.cern.ch/cmst2/unified/all_errors.json.

The structure is OK when we are dealing with workflows without ACDCs. But, in most of the cases the workflows have several ACDCs or even recoveries. Then, the structure of this json could be:

"PrepID":{ "Workflow/task":{ //Exit codes } "Workflow/task":{ //Exit codes } } "PrepID":{ ... We need to take care ONLY of the newest workflows, this is, the most recent ACDCs or recoveries created.

For example one of our views would be:

image1

dabercro commented 7 years ago

Do you know how Unified creates the file? I couldn't quite figure it out. (Or maybe I should have asked @vlimant directly...)

At one point, I was trying to duplicate the contents of all_errors.json using WMStats' jobdetail API, but I hadn't finished figuring out where to get the list of workflows needed. For that I was trying the reqmgr2 API (for example: https://cmsweb.cern.ch/reqmgr2/data/request?status=running-closed&detail=false). I got stuck though when I couldn't figure out which status or statuses matched the all_errors.json. Do you know what that might be? If not, is there a better way to get the list?

I think the best approach would be to remake all_errors.json to match how we'd like. If you are not sure how the file is made though, I can always just parse and restructure it for now.

vlimant commented 7 years ago

that files is created from recoveror => showError.parse_all ; for all workflows in assistance manual. @areinsvo I think that we have a find an alternative way indeed to get the right content, it is hard for me to give a good answer on how without having the current bigger picture.

paorozo commented 7 years ago

I think we can:

  1. Get the workflows by prepid using https://cmsweb.cern.ch/reqmgr2/data/request?prep_id=PREPID&detail=true
  2. Organize them by creation time.
  3. Get the failures from https://cmsweb.cern.ch/wmstatsserver/data/jobdetail/WORKFLOW-NAME
paorozo commented 7 years ago
  1. And of course, I forgot the unreported errors!
dabercro commented 7 years ago

@prozober #44 will group tasks under workflows and workflows under Prep ID. All but the Prep ID level are hidden by default.

paorozo commented 7 years ago

The new view is awesome! It provides us a great view of the failures' cluster. Next step, I think, is adding all the workflows related to a single prepID. For example PPD-PhaseISpring17wmLHEGENOnly-00012:

dabercro commented 7 years ago

@prozober Whenever I try this request API (for example, I'm looking at https://cmsweb.cern.ch/reqmgr2/data/request?prep_id=PPD-PhaseISpring17wmLHEGENOnly-00012&detail=true), I get workflows that look like

I never see anything like the prozober_ACDC# format in the all_errors.json, so I don't know how to get errors for those. Should I be worried about those, or only other workflows that have names like the first one in the list there? (For example, here, I eventually also see "pdmvserv_PPD-PhaseISpring17wmLHEGENOnly-00012_00005_v1__170503_201729_3916")

dabercro commented 7 years ago

Another current example is https://cmsweb.cern.ch/reqmgr2/data/request?prep_id=ReReco-Run2016D-DoubleMuon-18Apr2017-0001&detail=true

I only see the first workflow in my history... None of the ACDCs.

paorozo commented 7 years ago

https://vocms049.cern.ch/unified/all_errors.json only contains the errors from the original request. That's why I was proposing to change all_errors.json first. Either we can:

The only thing is, the tool needs to know is what Prep_ids are in manual-assistance. @vlimant, is it sounds reasonable?

vlimant commented 7 years ago

Hi, I think that we have to have the tool not rely on the full content of all_errors.json as it is very costly to construct and might be discontinued. What you need is the list of workflow in "assistance" (which you can get from that json for now, and later directly from the unified db when the tool will be on vocms049) and then need to perform a series of queries to reqmgr2 to know what are the related ACDC, there might be an existing query ( @amaltaro ) to get the chain of request tree orginating from a given request ( the way "cascade" status change works)

amaltaro commented 7 years ago

AFAIK there is no API for retrieving all the ACDCs a parent request has. But you can search for prepid and and request_type=Resubmission, should give you the same results.

vlimant commented 7 years ago

@amaltaro modulo checks on OriginalRequestName or something, to make sure the acdc belong to the right workflow

paorozo commented 7 years ago
dabercro commented 7 years ago

I think #46 is what you want for the globalerror page at least. It does slow down the page a bit at first though while a cache is populated with the jobdetail.