Closed ticoann closed 6 years ago
Please provide your use causes and detail requirement.
Usecase: For StepChain workflow there is no easy way to get the parentage between files. https://github.com/dmwm/WMCore/wiki/StepChain-Parentage
This means we can't get the parentage using FWJR but need to calculate the parentage from run lumi relation. To do that, we need to set the dataset parentage first which is known at the beginning of workflow creation. Step chain need to be able to add the dataset parentage to DBS directly without knowing the file parentage first.
Requirement: API is needed with pair of child and parent datasets which could be many to many. API just can have parameter for one child dataset and one parent dataset. If API either ignore the insert for already existing relation returns the successful insert or indicates insertion failure by exception or other form.
@yuyiguo, I updated the title as we discussed
As @ticoann and I discussed yesterday, there is no need for a new API for this use case. DBS API insertBulkBlock can be modified to handle this use case. The current insertBulkBlock accept a list of file parentages to fill DBS file, block and dataset parentages from bottom up. In the updated API, it will accept dataset parentages, but not both. This way we will have the dataset parentages filled while the files uploaded to DBS. We will save one DBS call at the same time to keep the data integrity.
Hmm, title is not updated. I will update that now.
The reason using the insertBulkBlock instead of separate API is
@yuyiguo, Yuyi it seem there is already parameter ds_parent_list in the api. Can I just use that?
After talking to Yuyi, we will change the code in WMCore to use dataset_parent_list instead of ds_parent_list
Add "dataset_parent_list" as parameter
related to https://github.com/dmwm/WMCore/issues/8590