dmwm / AsyncStageout

6 stars 10 forks source link

enforce consistent eventPerLumi (yes/no) in published block #4542

Closed belforte closed 2 years ago

belforte commented 7 years ago

pasting here from hypernews:

On 28/04/17 08:37, Stefano Belforte wrote:
> 
> *** Discussion title: Data Management Development
> 
> xie xie !
> CRAB/ASO only uses insertBulkBlock.
> So DBS is safe.
> But should some non-consistent data pop up, I guess
> we want to ring a bell, not silently reject it.
> So for mind sanity we'll add a check on our side as well.
> Stefano
> 
> On 28/04/17 08:00, Seangchan Ryu wrote:
>>
>> *** Discussion title: Data Management Development
>>
>> Yuyi added this checking in DBS client api.
>> https://github.com/dmwm/DBS/pull/546
>>
>> If insertBulkBlock  is used, this validation and conversion comes with that.
>>
>> Seangchan
>>
>> On Apr 27, 2017, at 3:33 PM, Yuyi Guo <yuyi@fnal.gov<mailto:yuyi@fnal.gov>> wrote:
>>
>> *** Discussion title: Data Management Development
>>
>> Just add an example for what Eric said.
>> Before  the block is sent to DB, we should check if there is 'event_count'  in the 'file_lumi_list'.  If so all of them is not none.
>>
>> Here is an example of current data format:
>> 'file_lumi_list': [{u'lumi_section_num': 27414, u'run_num': 1},
>> {u'lumi_section_num': 26422, u'run_num': 2},
>> {u'lumi_section_num': 29838, u'run_num': 3}]
>>
>> The new data with event/lumi will be something like:
>> 'file_lumi_list': [{u'lumi_section_num': 27414, u'run_num': 1, 'event_count': 10},
>> {u'lumi_section_num': 26422, u'run_num': 2, , 'event_count': 11},
>> {u'lumi_section_num': 29838, u'run_num': 3, , 'event_count': 12}]
>>
>> Of course, the current format is always correct.
>>
>> However, if we found below data:
>> 'file_lumi_list': [{u'lumi_section_num': 27414, u'run_num': 1, 'event_count': 10},
>> {u'lumi_section_num': 26422, u'run_num': 2, , 'event_count': None},
>> {u'lumi_section_num': 29838, u'run_num': 3, , 'event_count': 12}]
>>
>> Then we had to drop all the event_count, so the input data would be:
>> 'file_lumi_list': [{u'lumi_section_num': 27414, u'run_num': 1},
>> {u'lumi_section_num': 26422, u'run_num': 2},
>> {u'lumi_section_num': 29838, u'run_num': 3}]
>>
>> I want to pointed out here is that if we don't have event_count, we will not supply DBS with event_count=None. We will just drop the event_count as we have the current format.
>>
>> Thanks,
>> Yuyi
>>
>> -----Original Message-----
>> From: Stefano Belforte [mailto:stefano.belforte@cern.ch]
>> Sent: Thursday, April 27, 2017 2:57 PM
>> To: hn-cms-crabDevelopment@cern.ch<mailto:hn-cms-crabDevelopment@cern.ch>
>> Subject: Re: test against DBS on cmsweb-testbed
>>
>> *** Discussion title: CRAB Development
>>
>> OK. Sorry for being dense here, can you say more explicitly what should be
>> checked ?
>> Stefano
>>
>> On 27/04/17 18:09, Eric W Vaandering wrote:
>>
>> *** Discussion title: CRAB Development
>>
>> Right.
>>
>> The point is this â?ocan never happenâ? which, from decades of
>> experience means â?oit might happenâ?. We should check against it
>> somewhere even if it never happens and since its not practical to check in DBS,
>> weâ?Td like the data providers to agree in the contract to do this check.
>>
>> WMAgent is in a similar boat. We cannot find any reason this could ever
>> happen for a single block, but we are checking anyhow.
>>
>> Cheers,
>>
>> Eric
>>
>> (Switched HN forum to CRAB)
>>
>> On Apr 27, 2017, at 11:04 AM, Yuyi Guo <yuyi@fnal.gov<mailto:yuyi@fnal.gov>> wrote:
>>
>> Hi, Stefano:
>>
>> This is a very very small chance to get mixed data in CRAB, but we cannot
>> guarantee that CMSSW will always give the consistent data. In order to
>> protect our data, we asked the data producers to validate before upload to
>> DBS.
>> Thanks,
>> Yuyi
>>
>> From: Stefano Belforte <stefano.belforte@cern.ch<mailto:stefano.belforte@cern.ch>
>> <mailto:stefano.belforte@cern.ch>>
>> Date: Wednesday, April 26, 2017 at 2:58 PM
>> To: hn-cms-dmDevelopment <hn-cms-dmDevelopment@cern.ch<mailto:hn-cms-dmDevelopment@cern.ch>
>> <mailto:hn-cms-dmDevelopment@cern.ch>>
>> Subject: Re: test against DBS on cmsweb-testbed
>>
>> *** Discussion title: Data Management Development
>>
>> On 26/04/17 19:28, Yuyi Guo wrote:
>> I understood that mixing data with event/lumi in a block is almost
>> impossible for crab or WMAgent, however, I would ask you guys to validate
>> the data to make sure there is no mixing data just before publish to DBS.
>>
>> this can not possibly happen due to the way CRAB works.
>> A task is always processed with the same version of the code, and
>> every task only writes new blocks.
>> The only way to mix things would be to try hard to do it on purpose.
>>
>> So there is nothing to validate !
>>
>> Stefano
>>
>> -------------------------------------------------------------
>> Visit this CMS message (to reply or unsubscribe) at:
>>
>> https://hypernews.cern.ch/HyperNews/CMS/get/dmDevelopment/2014/1.ht
>> ml
>>
>> <https://hypernews.cern.ch/HyperNews/CMS/get/dmDevelopment/2014/1.ht
>> m
>> l&gt;
>>
>> [ MIME part of type text/html without a name stripped ]
>>
>> [ MIME part of type application/pkcs7-signature stripped ]
>>
>> -------------------------------------------------------------
>> Visit this CMS message (to reply or unsubscribe) at:
>>
>> https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/2627.htm
>> l
>>
>> -------------------------------------------------------------
>> Visit this CMS message (to reply or unsubscribe) at:
>> https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/2627/1.ht
>> ml
>>
>> -------------------------------------------------------------
>> Visit this CMS message (to reply or unsubscribe) at:
>> https://hypernews.cern.ch/HyperNews/CMS/get/dmDevelopment/2015.html
>>
>>  [ MIME part of type text/html without a name stripped ]
>>
>> -------------------------------------------------------------
>> Visit this CMS message (to reply or unsubscribe) at:
>> https://hypernews.cern.ch/HyperNews/CMS/get/dmDevelopment/2015/1.html
>>
> 
> -------------------------------------------------------------
> Visit this CMS message (to reply or unsubscribe) at: 
> https://hypernews.cern.ch/HyperNews/CMS/get/dmDevelopment/2015/1/1.html
> 
belforte commented 7 years ago

and this is Eric's explanation. missed in the thread above:


What should be checked (in English) is that 

for an entire block 
        if any file is missing events per lumi, 
                the events per lumi should be dropped for ALL files in that block.