Closed ticoann closed 6 years ago
Please provide your use causes and detail requirement.
Usecase: For StepChain workflow there is no easy way to get the parentage between files. https://github.com/dmwm/WMCore/wiki/StepChain-Parentage
Related to #568, #569, When parentage of the child files are discovered, we need to able to insert that parentage to DBS.
Requirement: Since missing parentage will be in whole dataset this will require bulk insert for performance reason. We can either provide list of bind variables shown above for bulk insert. However corresponding block parentage need to be inserted as well. It can be automatically figured in DBS API but that might be causing performance hit. Otherwise we can provide block parentage separately. But validation on that might be tricky.
Hi,
What's the plan / timeline for this?
Brian
@vlimant - you may want to follow this one.
:-) @yuyiguo, you can delete above comments then.
The proposed API: insertFileParentages([(cid1, pid1), (cid2, pid2), ... (cidn, pidn)]) This API will take the output from the API described in https://github.com/dmwm/DBS/issues/569.
The API will use the file parentage check the existing dataset parentage and will report error when they are not match. It will also update the block parentages too.
We will deal these parentages block by block. So we expect that WMAgent will send a block of data to DBS each time they call these APIs.
I think we should send the child block information with the call as well. if there is restriction on the method all the child files from same block.
Then the API will be insertFileParentages([(cid1, pid1), (cid2, pid2), ... (cidn, pidn)], childBlockName).
insertFileParentages([(cid1, pid1), (cid2, pid2), ... (cidn, pidn)], childBlockName) The unit tests should include both client and server side tests.
@yuyi what you like to name the first parameter? second one is childBlockName
It seems all other dbs parameters named not CamelCase but python standard. We should may change the parameters the same way, what do you think?
you are right, @ticoann
block_name: block name - string child_parent_id_list: [(cid1, pid1), (cid2, pid2), ... (cidn, pidn)] - list tuple of child parent id
i.e insertFileParentages([(cid1, pid1), (cid1, pid2), (cid2, pid2), ... (cidn, pidn)], childBlockName)
prerequisite: all the cids are from childBlockName and all files from that block without missing files. returns nothing.
unittest: compares inserted data and data which are retrieved back raise exception: if the patial parentage is inserted.