Open paorozo opened 7 years ago
Sure, I can add a check through PhEDEx. My guess though is that a lot of failures occur when something went wrong during transfer and PhEDEx didn't realize it. This leads to missing files. MIT is working on this: https://github.com/SmartDataProjects/ConsistencyCheck I can try to include results from here.
I also want to include the percentage of progress for output datasets, something similar to the assistance-manual summary: https://vocms049.cern.ch/unified/assistance.html#assistance-manual Maybe @vlimant can guide us.
Next thing, I think it is necessary to include a link to the related jira tickets. The query should have this format https://its.cern.ch/jira/issues/?jql=text~ReReco-Run2016B-MuonEG-18Apr2017_ver2-0014%20AND%20project%20=%20CMSCOMPPR, where ReReco-Run2016B-MuonEG-18Apr2017_ver2-0014 is the prepID, and the project is CMSCOMPPR.
Okay. My guess is the first thing would be easier to do once we move machines.
The link to JIRA is in #60 which I'll merge soon.
Dan, something weird is happening. If I go to https://vocms0113.cern.ch:80/seeworkflow/?workflow=mcremone_ACDC0_task_TOP-PhaseIITDRSpring17DR-00005__v1_T_170725_104805_4763 The report is empty, but still, the workflow got several errors https://cmsweb.cern.ch/wmstatsserver/data/jobdetail/mcremone_ACDC0_task_TOP-PhaseIITDRSpring17DR-00005__v1_T_170725_104805_4763
Could you please take a look?
So the file-cached errors were empty. This cache is not updated by the "clear cache" link. That only changes the cache in memory.
I think the problem is that I'm looking for ACDC information before it finishes running. The reports are then downloaded too early. Is there a way to check if an ACDC is finished running? If not, I can probably just prevent the cache from saving files that are empty JSONs.
Or I can just delete the file cache at the same time. It's not that much slower to make the calls again.
It seems the details for a specific workflow/exit_code, are not working as it should.
it says "No info for this error code", but I can see the details in here: https://cmsweb.cern.ch/wmstatsserver/data/jobdetail/pdmvserv_task_B2G-RunIISummer15wmLHEGS-01272__v1_T_170722_002539_3158
@dabercro, could you please take a look?
https://github.com/CMSCompOps/OpsSpace/pull/55 Fixes it. The problem was that the 71104 was included under a 'submitfailed' header while I was only looking at 'jobfailed'.
Hi @dabercro , I am trying to create an ACDC for the task "DataProcessing/DataProcessingMergeAODoutput", the related workflow is https://vocms0113.cern.ch:80/seeworkflow/?workflow=fabozzi_Run2016C-v2-BTagCSV-07Aug17_8029_170831_200030_7925 .
We already took action over that workflow, but one of the ACDCs had problems, so, I am selecting in our tool the "Only do this task" option. The tool creates the action we need, but it is checking the parameters for the other tasks, in this case, there are some tasks with not available sites for the assignment. So, the tool is asking me to select an alternative site to run the ACDCs we are not going to create. Could you please take a look?
In fact, the action for the task "DataProcessing" was created. I selected T2_CH_CERN just to test.
{"fabozzi_Run2016C-v2-BTagCSV-07Aug17_8029_170831_200030_7925": {"Action": "acdc", "Reasons": [], "ACDCs": ["mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211227_6015", "mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211212_4094", "mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211219_8621", "mcremone_ACDC0_Run2016C-v2-BTagCSV-07Aug17_8029_170919_211235_4518"], "user": "prozober", "Parameters": {"DataProcessing/DataProcessingMergeAODoutput": {"xrootd": "enabled", "sites": ["T2_CH_CERN", "T2_UK_SGrid_RALPP"], "memory": ""}, "DataProcessing": {"sites": ["T2_CH_CERN"], "memory": ""}}}}
Okay, this problem sounds familiar, but that was for a task that didn't exist in the past, right? I should be able to fix it, but if you want me to remove this task now so you can submit it correctly, I can do that more quickly.
Exactly, the problem was related to the LogCollect tasks, we do not ACDC those tasks. I already removed the action, thanks :)
Found the bug #72 fixes it.
To bring this issue back on track, I can use the dataset presence to tell the operator to enable xrootd.
Could we please include the secondary and primary input dataset presence? Something like this https://vocms049.cern.ch/unified/report/fabozzi_Run2016C-v2-MuOnia-18Apr2017_8028_170519_182404_8657. It is useful to make decisions, for example, where we should run and ACDC or recovery workflow.