disable publication for non-VALID input datasets

belforte commented 2 years ago

users can (partially) process datasets which are still in production (!?) via

config.Data.allowNonValidInputDataset = True

see e.g. https://cmsweb.cern.ch:8443/scheddmon/0197/rkansal/220705_155318:rkansal_crab_pfnano_v2_3_2017_HZJ_HToWW_M-125/debug/crabConfig.py which triggered the mail exchange [1] with Alan, Yuyi and Valentin

But it makes not sense to try to publish output in DBS since parentage info is not available for PRODUCTION dataset and things will end up with an endless error loop inside Publisher

[1]

Yuyi,
thanks for clarification, I already applied the required change to migration
server and it will not accept requests from clients if dataset is not in a VALID
state.
Valentin.

On  0, Yuyi Guo [<yuyi@fnal.gov>](mailto:yuyi@fnal.gov) wrote:
>    Thanks Alan for the explanation. I don't see a use case for  migrating
>    an incomplete block/dataset.
>
>
>    Valentin, This was the first I heard about this "error". No one should
>    touch a block/dataset in production status except for the data
>    processing group.
>
>
>    Cheers,
>
>    Yuyi
>
>
>    From: Alan Malta Rodrigues [<alan.malta@cern.ch>](mailto:alan.malta@cern.ch)
>    Date: Monday, July 11, 2022 at 5:59 AM
>    To: Valentin Y Kuznetsov [<vkuznet@protonmail.com>](mailto:vkuznet@protonmail.com), Yuyi Guo
>    [<yuyi@fnal.gov>](mailto:yuyi@fnal.gov)
>    Cc: Stefano Belforte [<stefano.belforte@gmail.com>](mailto:stefano.belforte@gmail.com), [klannon@nd.edu](mailto:klannon@nd.edu)
>    [<klannon@nd.edu>](mailto:klannon@nd.edu), Diego Ciangottini [<diego.ciangottini@cern.ch>](mailto:diego.ciangottini@cern.ch), Todor
>    T. Ivanov [<todor.trendafilov.ivanov@cern.ch>](mailto:todor.trendafilov.ivanov@cern.ch)
>    Subject: RE: weird migration use-case (missing block parentage for
>    existing dataset one)
>
>    Hi Valentin,
>    I can explain why there is no parent information for:
>     /HZJ_HToWW_M-125_TuneCP5_13TeV-powheg-jhugen727-pythia8/RunIISummer20U
>    L17MiniAODv2-106X_mc2017_realistic_v9-v2/MINIAODSIM#92f25318-1797-43ea-
>    a01e-02fda4b18908
>    and the reason is that this dataset is under production right now,
>    meaning that there is an active
>    workflow (running-open status) still writing to it.
>    In addition to that, it's a StepChain workflow. Their parentage
>    information is only performed once the
>    workflow moves to "close-out" status (basically getting announced).
>    I guess one can say that migrating a growing dataset between DBS
>    instances isn't really a valid
>    use case, since the migration acts on a snapshot of the dataset...
>    Cheers,
>    Alan.
>    ________________________________________
>    From: Valentin Kuznetsov [[vkuznet@protonmail.com](mailto:vkuznet@protonmail.com)]
>    Sent: Sunday, July 10, 2022 6:22 PM
>    To: Yuyi Guo
>    Cc: Stefano Belforte; Alan Malta Rodrigues; [klannon@nd.edu](mailto:klannon@nd.edu); Diego
>    Ciangottini; Todor Trendafilov Ivanov
>    Subject: weird migration use-case (missing block parentage for existing
>    dataset one)
>    Yuyi,
>    during debugging process of new Go-based migration service [1] we found
>    one
>    weird use-case which I would like to understand.
>    The following block
>    /HZJ_HToWW_M-125_TuneCP5_13TeV-powheg-jhugen727-pythia8/RunIISummer20UL
>    17MiniAODv2-106X_mc2017_realistic_v9-v2/MINIAODSIM#92f25318-1797-43ea-a
>    01e-02fda4b18908
>    has no parents in DBS, but its dataset
>    /HZJ_HToWW_M-125_TuneCP5_13TeV-powheg-jhugen727-pythia8/RunIISummer20UL
>    17MiniAODv2-106X_mc2017_realistic_v9-v2/MINIAODSIM
>    does have a parent dataset
>    /HZJ_HToWW_M-125_TuneCP5_13TeV-powheg-jhugen727-pythia8/RunIISummer20UL
>    17RECO-106X_mc2017_realistic_v6-v1/AODSIM
>    How this is possible? Does this case represent some "failure" or
>    missing data in
>    DBS or it is a real use-case. According to dataset details [2] it was
>    created
>    on 1647603603 UNIX time which translates into Mar 18th of 2022, see
>    ```
>    time.gmtime(1647603603)
>    time.struct_time(tm_year=2022, tm_mon=3, tm_mday=18, tm_hour=11,
>    tm_min=40, tm_sec=3, tm_wday=4, tm_yday=77, tm_isdst=0)
>    ```
>    The new DBS Go writer was put into production by May 17th (see slide 12
>    in [3]),
>    and it means that originally it was inserted into DBS using Python DBS
>    server.
>    Therefore, the logic of insertion comes from DBS Python server.
>    As such, I need to understand this use-case in order to make proper set
>    of
>    actions. Either we need to add block parent, or remove dataset parent
>    or adjust
>    logic of migration server to account for such use-case(s). But for that
>    it would
>    be very useful to understand this specific use-case and how we end-up
>    with it.
>    Thanks,
>    Valentin.
>    [1]
>    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm
>    _dbs2go_issues_53&d=DwIFAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQ
>    MBG2Q&m=Q66capO7HuiUOhppidUdfPob2CsDOdzAafuTwL7OnmQC2jPboPILFrVoIf2y_Tp
>    q&s=quEk3b3n0ImTb7XikH1vSDO-YvSt3uuVX8RZVpKmZ2Y&e=
>    [2]
>    [2]https://cmsweb.cern.ch/dbs/prod/global/DBSReader/datasets?dataset=/H
>    ZJ_HToWW_M-125_TuneCP5_13TeV-powheg-jhugen727-pythia8/RunIISummer20UL17
>    RECO-106X_mc2017_realistic_v6-v1/AODSIM&detail=true
>    [3]
>    [3]https://indico.cern.ch/event/1157140/contributions/4858857/attachmen
>    ts/2437408/4174867/220504%20-%20O%26C%20Weekly%20News.pdf
>
> References
>
>    1. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_dbs2go_issues_53&d=DwIFAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=Q66capO7HuiUOhppidUdfPob2CsDOdzAafuTwL7OnmQC2jPboPILFrVoIf2y_Tpq&s=quEk3b3n0ImTb7XikH1vSDO-YvSt3uuVX8RZVpKmZ2Y&e=
>    2. https://cmsweb.cern.ch/dbs/prod/global/DBSReader/datasets?dataset=/HZJ_HToWW_M-125_TuneCP5_13TeV-powheg-jhugen727-pythia8/RunIISummer20UL17RECO-106X_mc2017_realistic_v6-v1/AODSIM&detail=true
>    3. https://indico.cern.ch/event/1157140/contributions/4858857/attachments/2437408/4174867/220504 - O&C Weekly News.pdf

belforte commented 2 years ago

note to myself: publication for a task is controlled in the schedd via the classAd CRAB_Publish which is set in DagmanCreator based on tm_publication value. DBS status of input dataset is checked in DBSDataDiscovery. One easy way could be to override the value of tm_publication in DB inside DBSDataDiscovery, need to check if we have an API for that. Drawback: db info will not match what's in crab config which may be puzzling for future debuggers. Less appealing is to check dataset type again in DagmanCreator, since DBS queries do not belong there.

Maybe it is enough to override the task object content in DBSDataDiscovery w/o touching the DB ? Maybe changing DB value would be irrelevant anyhow ?

TO BE TESTED