CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Testing Tool and Adjusting Parameters #6

Closed dabercro closed 7 years ago

dabercro commented 8 years ago

@vlimant @prozober @areinsvo @mcremone

Since I don't really know everything the workflow team or Unified needs, feel free to make any comments or pull requests. We can also track the testing progress here.

From Jean-Roch:

although we have a unified wired to testbed

https://cmst2.web.cern.ch/cmst2/unified-testbed/

it might be simpler to have it wired to production and run this in "commissioning mode".

I have the feeling, looking at the example actions, that there is too many parameters passed down. Many of the them should not be needed for recovery & clone (proc version, sites, lfn, ...) since all these can be taken from the original workflow and such. let's tune this to what is actually needed.

We should modify the recoveror module to read the action json, and be able to operate it by hand. The way I see it for a fast integration is:

In order to view these actions from inside the CERN network, one can look at https://vocms0113.cern.ch:80/getaction. This shows actions submitted today. You can also pass a parameter "days" to look farther back. For example looking at https://vocms0113.cern.ch:80/getaction?days=20 will show some old testing actions.

Changing Parameters

To make it easier for everyone to track and comment on parameters, they are generated with these variables here: https://github.com/CMSCompOps/WorkflowWebTools/blob/d167a94ff822d7a80d3350eeadc4efe014621f75/runserver/static/js/addreason.js#L145-L184 The variable params results in a "decrease" "same" "increase" table, texts and bools are just text and "true/false" fields, and the opts variable results in more general radio buttons.

The site list is generated here: https://github.com/CMSCompOps/WorkflowWebTools/blob/d167a94ff822d7a80d3350eeadc4efe014621f75/runserver/templates/workflowtables.html#L19-L27 the form field is then made here https://github.com/CMSCompOps/WorkflowWebTools/blob/d167a94ff822d7a80d3350eeadc4efe014621f75/runserver/static/js/addreason.js#L231-L236

areinsvo commented 7 years ago

The new ACDCs for the workflow pdmvserv_task_EXO-RunIISummer15GS-03460 have been created and hopefully assigned. @prozober, please let me know if something isn't right: Assigned workflow: areinsvo_ACDC0_task_EXO-RunIISummer15GS-03460__v1_T_170307_151102_1973 to site: [u'T2_DE_RWTH'] and team production Assigned workflow: areinsvo_ACDC0_task_EXO-RunIISummer15GS-03460__v1_T_170307_151112_2569 to site: [u'T0_CH_CERN', u'T2_US_Florida'] and team production

paorozo commented 7 years ago

We have a couple of problems:

If a site is in draining, it shouldn't be "enabled" in the list we are using during the assignment. It will prevent us to have ACDCs stuck in acquired. What do you think @dabercro?

areinsvo commented 7 years ago

I found and fixed the problem with the task names. I will abort the two ACDCs mentioned above.

I think the normal recoveror checks which sites are ready and doesn't assign to sites in drain. I will add this to my code. However, if the operator only says to go to sites that are in drain, what should the script do? Probably refuse to create ACDCs and quit with an error message?

paorozo commented 7 years ago

I cannot find a case where we need to assign the workflow to a site into drain. I think disabling the draining sites from our assignment interface would be enough.

dabercro commented 7 years ago

Okay, that shouldn't take long. I'm thinking the sites in drain should be marked, so you know which they are (I'd make them red or something, it's more informative than them not being there at all) with a warning if no enabled sites are selected. How does that sound?

areinsvo commented 7 years ago

Doing it from the interface sounds good. As long as the site doesn't go into drain in the time between submitting the action and then running the script to actual create the ACDCs, that should work.

dabercro commented 7 years ago

Ah, that's a good point. Doing it from the interface should cut down on the occurrence, but I agree that you should probably handle it in your script too.

paorozo commented 7 years ago

Make the sites red in the interface is a good idea. How can we handle the script "exceptions" for the assignment? I mean, how can the script give some feedback to the interface?

areinsvo commented 7 years ago

@prozober as the operator, what would you like the script to do in that case? Assign it anyway, and you can catch it in the Unified critical page and deal with it, or fail with an error message, or option 3 I can't think of?

vlimant commented 7 years ago

For the sites in drain, the acdc might require it the hard way (the data is only at that site) and therefore the site should be in the whitelist, and used in assignment if the action is set. So assignment should go on (the use case would be that the operator knows that the site is going to come back soon and it's ok to assign it already like this leaving the agent to start submitting when the site comes back online) http://dabercro.web.cern.ch/dabercro/unified/showlog/?search=critical&module=GQ&limit=100

is picking up the ACDC that cannot run in these situations, BTW. So let's just not build more complication into the this.

  1. globalerror displays the errors the same way as in the unified errorreport
  2. operator/AI makes a judgement call on what to do
  3. the action is enacted regardless of what is it

leaving the operator/AI to judge whether or not to act this way

dabercro commented 7 years ago

27 includes red sites in drain. There's no warning yet, but I think I'll make that a lower priority for now. I will update the server within the hour probably after some testing. (I should probably spend some time writing more tests to make that useful.)

areinsvo commented 7 years ago

@prozober, let me know when you've resubmitted the action the way it should be, and we can test the script again. I've changed it back so it doesn't worry about sites in drain and trusts the operator to handle it, per Jean-Roch's suggestion.

paorozo commented 7 years ago

Allie, I've already sent the action. Please, go ahead.

areinsvo commented 7 years ago

Third time's the charm: areinsvo_ACDC0_task_EXO-RunIISummer15GS-03460__v1_T_170308_163052_6405
areinsvo_ACDC0_task_EXO-RunIISummer15GS-03460__v1_T_170308_163108_3300

paorozo commented 7 years ago

For task EXO-RunIISummer15GS-03460_0/EXO-RunIISummer15GS-03460_0MergeRAWSIMoutput/EXO-RunIISummer16DR80Premix-07371_0, I enabled the xrootd option, so, everything is OK. For EXO-RunIISummer15GS-03460_0 I didn't set any value (we need to take the value from its originalReques), but as you can see here: https://cmsweb.cern.ch/reqmgr2/fetch?rid=areinsvo_ACDC0_task_EXO-RunIISummer15GS-03460__v1_T_170308_163108_3300 we have "TrustSitelists": true

The ACDCs are running, we better let them finish.

@dabercro, is there a way we can "uncheck" the radio button for xrootd, secondary and splitting options?

dabercro commented 7 years ago

@prozober https://github.com/dabercro/WorkflowWebTools/commit/35d40c9a484674ede8afcb8840326899e2ab4728 allows you to double click a button for xrootd, etc to make it false again. Since it's not a backend change, I was able to push it to the server already without a restart.

areinsvo commented 7 years ago

What does the json look like if it is false? Right now it is 'xrootd': 'enabled' when set to true. Safe to assume it is set to 'xrootd':'disabled' when false?

dabercro commented 7 years ago

Yeah, that's exactly what it'll be. Keep in mind that it might also not be set. This thread is getting long, but I put a comment a couple days ago of how I would read the dictionary. Here it is again:

use_xrootd = response[prepID]['Parameters'].get('xrootd')
if use_xrootd is None:
    # If not set, get the default value of the xrootd
    use_xrootd = some_fuction_call(parameters to get recovery docs?)

# More pythonic would be:
# use_xrootd = response[prepID]['Parameters'].get('xrootd', some_fuction_call(params))

if use_xrootd == 'enabled':
    # Using enabled option
elif use_xrootd == 'disabled':
    # Using disabled option
else:
    # Error handling for weird value
areinsvo commented 7 years ago

Oops, sorry about that. I did see it (and use it), but I clearly didn't remember all of the details. Thanks Dan!

paorozo commented 7 years ago

There are 61 wfs in manual-assistance https://vocms049.cern.ch/unified/assistance.html#assistance-manual but https://vocms049.cern.ch/unified/all_errors.json is empty. @vlimant, @areinsvo, could you please take a look?

vlimant commented 7 years ago

yes, I disabled it by mistake. It should come back in next cycles. in-fine we need to find a way to decouple to unified for building it's content. The list of workflows should be enough. maybe we have to plan how to do this

dabercro commented 7 years ago

I already have something that gets errors in the same format from /wmstatsserver/data/jobdetail/ using the workflow name alone. I'll make a test branch that does this with the all_errors.json keys and compare the results with using the full file.

paorozo commented 7 years ago

Taking a quick look, almost all the workflows in assistance have reading issues, I think this is the moment to test the decision making using the clustering algorithm. Once the global error is populated, I will send a couple of ACDCs. @areinsvo I will let you know when I send the actions.

dabercro commented 7 years ago

@prozober To manually force the global errors to update, navigate to https://vocms0113.cern.ch:80/resetcache This is linked on the welcome page. I should make it accessible from the global errors page too.

vlimant commented 7 years ago

@dabercro we might consider moving these to vocms049 and integrate in Unified, so that it has direct access to the db. The separation "setting action" and "enacting" should stay separated anyways IMO, so getting into 049 will not be an issue

dabercro commented 7 years ago

That would probably be a good idea. When working on vocms0113 though, I had trouble getting mod_wsgi compiled for Python 2.7. Would you want to set that up on vocms049, or should we just use the built in Cherrypy server and only open the used port to CERN addresses?

paorozo commented 7 years ago

@areinsvo I've submitted an action for the workflow pdmvserv_task_HIG-PhaseIFall16wmLHEGS-00056__v1_T_170316_161618_7407, could you please take a look?

{u'HIG-PhaseIFall16wmLHEGS-00056_0/HIG-PhaseIFall16wmLHEGS-00056_0MergeLHEoutput': {'xrootd': u'enabled', 'sites': [u'T2_UK_London_Brunel', u'T2_UK_London_IC', u'T2_UK_SGrid_Bristol', u'T2_UK_SGrid_RALPP'], 'memory': u''}, u'HIG-PhaseIFall16wmLHEGS-00056_0/HIG-PhaseIFall16wmLHEGS-00056_0MergeRAWSIMoutput': {'xrootd': u'enabled', 'sites': [u'T2_UK_London_Brunel', u'T2_UK_London_IC', u'T2_UK_SGrid_Bristol', u'T2_UK_SGrid_RALPP'], 'memory': u''}, u'HIG-PhaseIFall16wmLHEGS-00056_0/HIG-PhaseIFall16wmLHEGS-00056_0MergeRAWSIMoutput/HIG-PhaseIFall16DR-00109_0': {'sites': u'T1_US_FNAL', 'memory': u''}, 'AllSteps': {'memory': u''}}

areinsvo commented 7 years ago

ACDCs created for three tasks: areinsvo_ACDC0_task_HIG-PhaseIFall16wmLHEGS-00056v1_T_170328_164213_3110 areinsvo_ACDC0_task_HIG-PhaseIFall16wmLHEGS-00056__v1_T_170328_164220_3110 areinsvo_ACDC0_task_HIG-PhaseIFall16wmLHEGS-00056v1_T_170328_164228_1417

@prozober Let me know if anything needs to be changed.

paorozo commented 7 years ago

@areinsvo, I created by mistake the ACDCs for this workflow today, I am sorry. I aborted your ACDCs. Could you please take a look at the action I just sent for pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717. Thanks!

paorozo commented 7 years ago

BTW, Ali's ACDCs were correctly created and assigned. Let's see how task_TRK-PhaseIFall16GS-00017 runs.

areinsvo commented 7 years ago

I tried to run the script on the new action that you created, but it fails my check against creating partial ACDCs. According to the ACDC documents, there should be 9 tasks to recover (see below), but the action that was submitted only includes 6 tasks (numbers 3 - 8). Am I supposed to be ignoring tasks that include "CleanupUnmerged"?

  1. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0CleanupUnmergedRAWSIMoutput
  2. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_0/TRK-PhaseIFall16DR-00034_0CleanupUnmergedRAWSIMoutput
  3. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0
  4. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_0/TRK-PhaseIFall16DR-00034_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_1/TRK-PhaseIFall16DR-00034_1MergeAODSIMoutput
  5. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput
  6. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_0/TRK-PhaseIFall16DR-00034_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_1
  7. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_0
  8. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_0/TRK-PhaseIFall16DR-00034_0MergeRAWSIMoutput
  9. /pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717/TRK-PhaseIFall16GS-00017_0/TRK-PhaseIFall16GS-00017_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_0/TRK-PhaseIFall16DR-00034_0MergeRAWSIMoutput/TRK-PhaseIFall16DR-00034_1/TRK-PhaseIFall16DR-00034_1CleanupUnmergedAODSIMoutput
vlimant commented 7 years ago

yes, cleanup should be ignored indeed. isn't recoveror doing this by default ?

areinsvo commented 7 years ago

Not explicitly, although I'm using WMErr rather than getSummary to get the list of tasks. Maybe cleanup jobs were already excluded from getSummary so it didn't matter before. It's an easy thing to add to my script, however.

ACDCs created for pdmvserv_task_TRK-PhaseIFall16GS-00017v1_T_170310_150656_7717: areinsvo_ACDC0_task_TRK-PhaseIFall16GS-00017__v1_T_170328_174440_2926 areinsvo_ACDC0_task_TRK-PhaseIFall16GS-00017v1_T_170328_174448_4410 areinsvo_ACDC0_task_TRK-PhaseIFall16GS-00017v1_T_170328_174457_1952 areinsvo_ACDC0_task_TRK-PhaseIFall16GS-00017__v1_T_170328_174504_9699 areinsvo_ACDC0_task_TRK-PhaseIFall16GS-00017v1_T_170328_174513_6263 areinsvo_ACDC0_task_TRK-PhaseIFall16GS-00017__v1_T_170328_174521_1144

paorozo commented 7 years ago

@dabercro, we have a problem with the sites checked by default for the assignment. e.g. https://vocms0113.cern.ch:80/seeworkflow/?workflow=pdmvserv_EGM-PhaseIFall16DR-00014_00022_v0__170324_201657_9223

Could you please take a look?

dabercro commented 7 years ago

The reason for that is not all the sites show up in the recovery docs. For example, if I compare with: https://cmsweb.cern.ch/couchdb/acdcserver/_design/ACDC/_view/byCollectionName?key=%22pdmvserv_EGM-PhaseIFall16DR-00014_00022_v0__170324_201657_9223%22&include_docs=true&reduce=false I see no T1_DE_KIT, which is listed as a site that needs recovering in the table, but not checked by default. The same thing for T1_UK_RAL (an enabled site)...

What is the preferred behavior? I thought we wanted to automate using the recovery docs. Maybe my recovery doc query is wrong?

paorozo commented 7 years ago

Please, forget my comment, I got completely confused. The sites by default are OK, this is the behavior we want. Sorry!

paorozo commented 7 years ago

If you do not mind, could you please delete the rows of exit codes with zero occurrences? Thanks!

dabercro commented 7 years ago

Okay, that should be easy.

Just a heads up, I'm working on the Auto/Manual/Ban site selection today. The backend is a little tricky, but I think I almost have it. I hope to update the server tomorrow evening.

paorozo commented 7 years ago

Thanks Dan and Ali. task_TRK-PhaseIFall16GS-00017 is running fine, just a couple of failures but they are not related to the ACDC creation and assignment. I would like to do another test, in this case, we will change the splitting parameter. Workflow:pdmvserv_task_EXO-PhaseIFall16GS-00011__v1_T_170309_130412_5503 Action: u'EXO-PhaseIFall16GS-00011_0/EXO-PhaseIFall16GS-00011_0MergeRAWSIMoutput/EXO-PhaseIFall16DR-00037_0/EXO-PhaseIFall16DR-00037_0MergeRAWSIMoutput/EXO-PhaseIFall16DR-00037_1': {'sites': u'T1_ES_PIC', 'memory': u''}, u'EXO-PhaseIFall16GS-00011_0/EXO-PhaseIFall16GS-00011_0MergeRAWSIMoutput/EXO-PhaseIFall16DR-00037_0': {'memory': u'', 'sites': u'T1_ES_PIC', 'splitting': u'2x'}, u'EXO-PhaseIFall16GS-00011_0/EXO-PhaseIFall16GS-00011_0MergeRAWSIMoutput/EXO-PhaseIFall16DR-00037_0/EXO-PhaseIFall16DR-00037_0MergeRAWSIMoutput/EXO-PhaseIFall16DR-00037_1/EXO-PhaseIFall16DR-00037_1MergeAODSIMoutput/EXO-PhaseIFall16MiniAOD-00036_0/EXO-PhaseIFall16MiniAOD-00036_0MergeMINIAODSIMoutput': {'sites': u'T1_ES_PIC', 'memory': u''}, 'AllSteps': {'memory': u''}}

@areinsvo , do we know how to modify the splitting to be 2x, 3x and max?

areinsvo commented 7 years ago

@prozober The script can handle 2x and 3x splitting, but can you clarify what is meant by max splitting?

I tried to run the test on the workflow you suggested, but it failed with the output "I should not be doing splitting for this type of request" because the RequestType is TaskChain and 'InputDataset' is not found in Task1. This bit of code was copied over from the recoveror.py Unified module. Is this check no longer appropriate? Or should it not have failed in this case?

Only one ACDC was created (out of the 3 tasks): areinsvo_ACDC0_task_EXO-PhaseIFall16GS-00011__v1_T_170329_210649_7578 I assume this needs to be aborted until we get the script working to produce all three ACDCs at once?

areinsvo commented 7 years ago

The lone ACDC was aborted. areinsvo_ACDC0_task_EXO-PhaseIFall16GS-00011__v1_T_170329_210649_7578

paorozo commented 7 years ago
paorozo commented 7 years ago

Well, bad news @areinsvo, the ACDCs for pdmvserv_task_TRK-PhaseIFall16GS-00017__v1_T_170310_150656_7717 are not okay, and I just realized it. The tasks you mentioned here https://github.com/CMSCompOps/WorkflowWebTools/issues/6#issuecomment-289809010, were the ones we needed to ACDC. But, then I went through the six ACDCs we created, and the are related to 3 tasks. So, every task was ACDC twice.

These are the tasks related to each ACDC:

As you can see, the following tasks are missing:

I need to invalidate the duplicated files. Then I need to create the missing ACDCs through the scripts. It's better that I work on this at my late afternoon, so we can synchronize our actions.

areinsvo commented 7 years ago

Yes, I see the issue in my code. The problem came up when I tried to go from the task name provided in the action json to the full task name needed by req mgr. I will work on fixing this today. Early afternoon tomorrow, @prozober, if you want to resubmit the action for that workflow, I can run the script and we can work together to make sure it was fixed properly.

I will add the option for 'max' splitting to the script.

@prozober, regarding the issues with splitting, you are right that only EXO-PhaseIFall16GS-00011_0/EXO-PhaseIFall16GS-00011_0MergeRAWSIMoutput/EXO-PhaseIFall16DR-00037_0 needed the splitting changed, and the ACDC for /**/EXO-PhaseIFall16MiniAOD-00036_0MergeMINIAODSIMoutput shouldn't have any problem. The script would have done that, but as soon as one of the ACDCs has problems, it quits and doesn't try to create the rest of the ACDCs.

@vlimant Any comment on whether splitting should be allowed in the case RequestType is TaskChain and 'InputDataset' is not found in Task1? This check was copied over from recoveror.py, but I'm not sure it is valid here.

areinsvo commented 7 years ago

Max splitting was added and the task names are now treated correctly. Ready for another test when you are @prozober

paorozo commented 7 years ago

We ran out of small and low priority workflows to test. I think we need to submit a couple of backfills workflows, @vlimant, in your opinion, what would be good candidates?

paorozo commented 7 years ago

Hi @areinsvo, we have three low priority workflows to test. I just sent the action for all of them.

{"pdmvserv_task_SMP-RunIISummer16DR80Premix-00203__v1_T_170412_214135_9645": {"Action": "recover", "Reasons": ["Just a test to see what it looks like."], "user": "prozober", "Parameters": {"SMP-RunIISummer16DR80Premix-00203_0/SMP-RunIISummer16DR80Premix-00203_1/SMP-RunIISummer16DR80Premix-00203_1MergeAODSIMoutput": {"sites": ["T2_UK_London_Brunel"], "memory": ""}, "SMP-RunIISummer16DR80Premix-00203_0/SMP-RunIISummer16DR80Premix-00203_1/SMP-RunIISummer16DR80Premix-00203_1MergeAODSIMoutput/SMP-RunIISummer16MiniAODv2-00205_0/SMP-RunIISummer16MiniAODv2-00205_0MergeMINIAODSIMoutput": {"sites": ["T2_UK_London_Brunel"], "memory": ""}}}, "pdmvserv_task_SMP-RunIISummer15wmLHEGS-00115__v1_T_170407_163921_622": {"Action": "recover", "Reasons": ["Just a test to see what it looks like."], "user": "prozober", "Parameters": {"SMP-RunIISummer15wmLHEGS-00115_0/SMP-RunIISummer15wmLHEGS-00115_0MergeRAWSIMoutput/SMP-RunIISummer16DR80Premix-00201_0": {"sites": ["T1_US_FNAL", "T2_US_UCSD"], "memory": ""}, "SMP-RunIISummer15wmLHEGS-00115_0/SMP-RunIISummer15wmLHEGS-00115_0MergeRAWSIMoutput/SMP-RunIISummer16DR80Premix-00201_0/SMP-RunIISummer16DR80Premix-00201_1": {"sites": "T1_US_FNAL", "memory": ""}, "SMP-RunIISummer15wmLHEGS-00115_0/SMP-RunIISummer15wmLHEGS-00115_0MergeRAWSIMoutput/SMP-RunIISummer16DR80Premix-00201_0/SMP-RunIISummer16DR80Premix-00201_1/SMP-RunIISummer16DR80Premix-00201_1MergeAODSIMoutput/SMP-RunIISummer16MiniAODv2-00203_0": {"sites": ["T1_UK_RAL", "T1_US_FNAL"], "memory": ""}}}, "pdmvserv_task_EXO-RunIISummer15GS-09915__v1_T_170410_123400_282": {"Action": "recover", "Reasons": ["Just a test to see what it looks like."], "user": "prozober", "Parameters": {"EXO-RunIISummer15GS-09915_0/EXO-RunIISummer15GS-09915_0MergeRAWSIMoutput/EXO-RunIISummer16DR80Premix-08938_0/EXO-RunIISummer16DR80Premix-08938_1/EXO-RunIISummer16DR80Premix-08938_1MergeAODSIMoutput/EXO-RunIISummer16MiniAODv2-08873_0": {"xrootd": "enabled", "sites": ["T2_UK_London_Brunel", "T2_UK_London_IC", "T2_UK_SGrid_Bristol"], "memory": ""}}}}

areinsvo commented 7 years ago

Hi @prozober ,

I'm confused. For the last two workflows, everything looks fine, but for [1], there are no errors listed in the WMErr document I use to make sure we aren't doing partial ACDCs. The script fails because the number of tasks with errors in WMErr doesn't match the number of tasks in the action json. Any idea why that might be happening? [1] pdmvserv_task_SMP-RunIISummer16DR80Premix-00203__v1_T_170412_214135_9645

The ACDCs for pdmvserv_task_EXO-RunIISummer15GS-09915v1_T_170410_123400_282 and pdmvserv_task_SMP-RunIISummer15wmLHEGS-00115v1_T_170407_163921_622 have been submitted.

paorozo commented 7 years ago

Maybe because the two involved tasks have unreported errors?

https://vocms049.cern.ch/unified/report/pdmvserv_task_SMP-RunIISummer16DR80Premix-00203__v1_T_170412_214135_9645 https://cmsweb.cern.ch/couchdb/acdcserver/_design/ACDC/_view/byCollectionName?key=%22pdmvserv_task_SMP-RunIISummer16DR80Premix-00203__v1_T_170412_214135_9645%22&include_docs=true&reduce=false

paorozo commented 7 years ago

I checked the two workflows left, and the ACDCs look nice. I will keep an eye on them.