dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
45 stars 106 forks source link

subscribe any new block we inject from WMAgent to DataOps ? #6005

Closed hufnagel closed 8 years ago

hufnagel commented 9 years ago

Followup to #5945. Talked to Nicolo and this could work technically.

Basically, when the WMAgent creates a new block in PhEDEx, ie. when file injections create a new block, immediately make a block level subscription for that block to the site we inject it at under DataOps.

This would help with space accounting for DataOps. Not sure if it would break anything else.

amaltaro commented 9 years ago

Please no, it would create an insane amount of mails/subscriptions. It looks much more easy - for me at least - to allow phedex group for data injection, since when we are injecting a block against a specific site, we are also saying this data is available there...

hufnagel commented 9 years ago

You can make auto-approving subscriptions without mails. All Tier0 subscriptions to CERN work like that. Absolutely no extra work for Ops, they just magically appear and are approved.

hufnagel commented 9 years ago

Data groups at injection time is a absolute no-go because injection is a file level operation, while ownership is recorded at block level. You need separate calls for them.

amaltaro commented 9 years ago

Hm, anyways I wonder if there would be any impact handling big datasets with 2k different subscriptions, that looks ... not clean to me. But don't we have 2 separate calls already? We need to inject either a block and its files into TMBD.

hufnagel commented 9 years ago

You inject FILES (and this might or might not create a new block). You subscribe BLOCKS. Very different things.

Nicolo said the extra subscriptions should be manageable. They would be temporary anyways because most of the data DataOps produces is eventually moved somewhere else for aggregation purposes, right ? That would remove the individual block level subscriptions from the system.

amaltaro commented 9 years ago

How the block names get defined them? When the agent injects file, does it get back the block name to which the file was injected? Thinking how we then have the same block name between DBS and PhEDEx...

Lots of things have changed in operation and I'm not sure anymore whether there is a final move subscription to aggregate data. @vlimant for sure knows that.

hufnagel commented 9 years ago

Not sure myself about your first question. The agent somehow gets the block name, not sure if that's a return from the file injection, it creates it upfront or it specifically checks for it. These three things come to mind. @ticoann , any comments on this ?

vlimant commented 9 years ago

There is no "final move" for various practical reasons irrelevant to the question of whether should dataops own a block dataops inject

hufnagel commented 9 years ago

Without a final move this needs to be rethought. We cannot just leave blocks owned by DataOps scattered across the sites. At the very least all these subscriptions would need to be changed to be AnalysisOps. DataOps has limited space, we need to cleanup after ourselves.

vlimant commented 9 years ago

5945 """There are no more moves ; only transfers and deletes"""

hufnagel commented 9 years ago

My statement stands, DataOps doesn't have enough space available to them to not cleanup (where cleanup can be delete or subscribing to a different group) data it produces. The DataOps space is designed to keep processing input samples and to temporarily store processing output. Not permanently store processing output.

vlimant commented 9 years ago

agreed, hence the deletes.

ticoann commented 8 years ago

duplicate issue #5945

vlimant commented 7 years ago

May we please re-open this and discuss this more ?