dmwm / DBS

CMS Dataset Bookkeeping Service
Apache License 2.0
7 stars 21 forks source link

modify DBS api insertBulkBlock to take "dataset_parent_list" parameter for setting the parentage between datasets #566

Closed ticoann closed 6 years ago

ticoann commented 6 years ago

Add "dataset_parent_list" as parameter

related to https://github.com/dmwm/WMCore/issues/8590

  1. if parent dataset is set insert parentage relation in DBS (ignore if it is already set)
  2. Don't allow setting file parentage if parent dataset is defined.
yuyiguo commented 6 years ago

Please provide your use causes and detail requirement.

ticoann commented 6 years ago

Usecase: For StepChain workflow there is no easy way to get the parentage between files. https://github.com/dmwm/WMCore/wiki/StepChain-Parentage

This means we can't get the parentage using FWJR but need to calculate the parentage from run lumi relation. To do that, we need to set the dataset parentage first which is known at the beginning of workflow creation. Step chain need to be able to add the dataset parentage to DBS directly without knowing the file parentage first.

Requirement: API is needed with pair of child and parent datasets which could be many to many. API just can have parameter for one child dataset and one parent dataset. If API either ignore the insert for already existing relation returns the successful insert or indicates insertion failure by exception or other form.

ticoann commented 6 years ago

@yuyiguo, I updated the title as we discussed

yuyiguo commented 6 years ago

As @ticoann and I discussed yesterday, there is no need for a new API for this use case. DBS API insertBulkBlock can be modified to handle this use case. The current insertBulkBlock accept a list of file parentages to fill DBS file, block and dataset parentages from bottom up. In the updated API, it will accept dataset parentages, but not both. This way we will have the dataset parentages filled while the files uploaded to DBS. We will save one DBS call at the same time to keep the data integrity.

ticoann commented 6 years ago

Hmm, title is not updated. I will update that now.

ticoann commented 6 years ago

The reason using the insertBulkBlock instead of separate API is

  1. It is not good to add parentage before actual data is generated.
  2. Agent will be responsible for doing this and insertBulkBlock is the API it is already using.
ticoann commented 6 years ago

@yuyiguo, Yuyi it seem there is already parameter ds_parent_list in the api. Can I just use that?

ticoann commented 6 years ago

After talking to Yuyi, we will change the code in WMCore to use dataset_parent_list instead of ds_parent_list