CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Adjusting parameters for seeworkflows #55

Open paorozo opened 7 years ago

paorozo commented 7 years ago

Could we please include the secondary and primary input dataset presence? Something like this https://vocms049.cern.ch/unified/report/fabozzi_Run2016C-v2-MuOnia-18Apr2017_8028_170519_182404_8657. It is useful to make decisions, for example, where we should run and ACDC or recovery workflow.

dabercro commented 7 years ago

Sure, I can add a check through PhEDEx. My guess though is that a lot of failures occur when something went wrong during transfer and PhEDEx didn't realize it. This leads to missing files. MIT is working on this: https://github.com/SmartDataProjects/ConsistencyCheck I can try to include results from here.

paorozo commented 7 years ago

I also want to include the percentage of progress for output datasets, something similar to the assistance-manual summary: https://vocms049.cern.ch/unified/assistance.html#assistance-manual Maybe @vlimant can guide us.

Next thing, I think it is necessary to include a link to the related jira tickets. The query should have this format https://its.cern.ch/jira/issues/?jql=text~ReReco-Run2016B-MuonEG-18Apr2017_ver2-0014%20AND%20project%20=%20CMSCOMPPR, where ReReco-Run2016B-MuonEG-18Apr2017_ver2-0014 is the prepID, and the project is CMSCOMPPR.

dabercro commented 7 years ago

Okay. My guess is the first thing would be easier to do once we move machines.

The link to JIRA is in #60 which I'll merge soon.

paorozo commented 7 years ago

Dan, something weird is happening. If I go to https://vocms0113.cern.ch:80/seeworkflow/?workflow=mcremone_ACDC0_task_TOP-PhaseIITDRSpring17DR-00005__v1_T_170725_104805_4763 The report is empty, but still, the workflow got several errors https://cmsweb.cern.ch/wmstatsserver/data/jobdetail/mcremone_ACDC0_task_TOP-PhaseIITDRSpring17DR-00005__v1_T_170725_104805_4763

Could you please take a look?

dabercro commented 7 years ago

So the file-cached errors were empty. This cache is not updated by the "clear cache" link. That only changes the cache in memory.

I think the problem is that I'm looking for ACDC information before it finishes running. The reports are then downloaded too early. Is there a way to check if an ACDC is finished running? If not, I can probably just prevent the cache from saving files that are empty JSONs.

dabercro commented 7 years ago

Or I can just delete the file cache at the same time. It's not that much slower to make the calls again.

paorozo commented 7 years ago

It seems the details for a specific workflow/exit_code, are not working as it should.

e.g. https://vocms0113.cern.ch:80/explainerror?errorcode=71104&workflowstep=/pdmvserv_task_B2G-RunIISummer15wmLHEGS-01272__v1_T_170722_002539_3158/B2G-RunIISummer15wmLHEGS-01272_0/B2G-RunIISummer16DR80Premix-02154_0

it says "No info for this error code", but I can see the details in here: https://cmsweb.cern.ch/wmstatsserver/data/jobdetail/pdmvserv_task_B2G-RunIISummer15wmLHEGS-01272__v1_T_170722_002539_3158

@dabercro, could you please take a look?

dabercro commented 7 years ago

https://github.com/CMSCompOps/OpsSpace/pull/55 Fixes it. The problem was that the 71104 was included under a 'submitfailed' header while I was only looking at 'jobfailed'.

paorozo commented 7 years ago

Hi @dabercro , I am trying to create an ACDC for the task "DataProcessing/DataProcessingMergeAODoutput", the related workflow is https://vocms0113.cern.ch:80/seeworkflow/?workflow=fabozzi_Run2016C-v2-BTagCSV-07Aug17_8029_170831_200030_7925 .

We already took action over that workflow, but one of the ACDCs had problems, so, I am selecting in our tool the "Only do this task" option. The tool creates the action we need, but it is checking the parameters for the other tasks, in this case, there are some tasks with not available sites for the assignment. So, the tool is asking me to select an alternative site to run the ACDCs we are not going to create. Could you please take a look?

paorozo commented 7 years ago

In fact, the action for the task "DataProcessing" was created. I selected T2_CH_CERN just to test.

{"fabozzi_Run2016C-v2-BTagCSV-07Aug17_8029_170831_200030_7925": {"Action": "acdc", "Reasons": [], "ACDCs": ["mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211227_6015", "mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211212_4094", "mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211219_8621", "mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211235_4518"], "user": "prozober", "Parameters": {"DataProcessing/DataProcessingMergeAODoutput": {"xrootd": "enabled", "sites": ["T2_CH_CERN", "T2_UK_SGrid_RALPP"], "memory": ""}, "DataProcessing": {"sites": ["T2_CH_CERN"], "memory": ""}}}}

dabercro commented 7 years ago

Okay, this problem sounds familiar, but that was for a task that didn't exist in the past, right? I should be able to fix it, but if you want me to remove this task now so you can submit it correctly, I can do that more quickly.

paorozo commented 7 years ago

Exactly, the problem was related to the LogCollect tasks, we do not ACDC those tasks. I already removed the action, thanks :)

dabercro commented 7 years ago

Found the bug #72 fixes it.

dabercro commented 7 years ago

To bring this issue back on track, I can use the dataset presence to tell the operator to enable xrootd.