dmwm / dbs2go

DBS server written in Go
MIT License
5 stars 4 forks source link

Failure in insertFileParents DBS API #36

Closed vkuznet closed 2 years ago

vkuznet commented 2 years ago

Alan reported the issue with the following:

childBlockName = '/VBF_HHTo2G2Tau_CV_1_C2V_1_C3_0_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL16MiniAODAPVv2-HIG_RunIISummer20UL16MiniAODAPVv2_02457_0_SC_6Steps_PU_May2022_Val_Alanv1-v11/MINIAODSIM#25f5a41e-00e8-4148-9f62-36e2125cef7f'
listChildParent = [[661281877, 661282037], [661281877, 661282077], [661281877, 661282117], [661281877, 661281997], [661281917, 661282117], [661281917, 661282157], [661281917, 661282197], [661281917,
661282237], [661281957, 661282277], [661281957, 661282317], [661281957, 661282397], [661281957, 661282357]]

dbs = DbsApi(url = 'https://cmsweb-testbed.cern.ch/dbs/int/global/DBSWriter', aggregate=True)
dbs.insertFileParents({"block_name": childBlockName, "child_parent_id_list": listChildParent})

and it produces the following output:

DBS Server error: [{'error': {'reason': 'DBSError Code:118 Description:DBS invalid parameter for the DBS API Function:dbs.fileparents.InsertFileParentsBlockTxt Message:not all files present in block Error: record error', 'message': '', 'function': 'dbs.fileparents.InsertFileParents', 'code': 110}, 'http': {'method': 'POST', 'code': 400, 'timestamp': '2022-05-07 15:18:02.847028141 +0000 UTC m=+760278.987232366', 'path': '/dbs/int/global/DBSWriter/fileparents', 'user_agent': 'DBSClient/Unknown/', 'x_forwarded_host': 'cmsweb-testbed.cern.ch:8443', 'x_forwarded_for': '67.249.140.245', 'remote_addr': '188.184.72.217:53189'}, 'exception': 400, 'type': 'HTTPError', 'message': 'DBSError Code:110 Description:DBS DB insert record error Function:dbs.fileparents.InsertFileParents Message: Error: nested DBSError Code:118 Description:DBS invalid parameter for the DBS API Function:dbs.fileparents.InsertFileParentsBlockTxt Message:not all files present in block Error: record error'}]
Traceback (most recent call last):
  File "/Users/vk/tmp/venv/test.py", line 16, in <module>
    dbs.insertFileParents({"block_name": childBlockName, "child_parent_id_list": listChildParent})
  File "/Users/vk/tmp/venv/lib/python3.9/site-packages/dbs/apis/dbsClient.py", line 743, in insertFileParents
    return self.__callServer("fileparents", data=fileParentObj, callmethod='POST' )
  File "/Users/vk/tmp/venv/lib/python3.9/site-packages/dbs/apis/dbsClient.py", line 464, in __callServer
    self.__parseForException(http_error)
  File "/Users/vk/tmp/venv/lib/python3.9/site-packages/dbs/apis/dbsClient.py", line 508, in __parseForException
    raise http_error
  File "/Users/vk/tmp/venv/lib/python3.9/site-packages/dbs/apis/dbsClient.py", line 461, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/tmp/venv/lib/python3.9/site-packages/RestClient/RestApi.py", line 42, in post
    return http_request(self._curl)
  File "/Users/vk/tmp/venv/lib/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request
vkuznet commented 2 years ago

@amaltaro please provide additional information. The listChildParent has overlapping ranges like:

[661281877, 661282037],
[661281877, 661282077],
[661281877, 661282117],
[661281877, 661281997]
...

My question is how those should be treated? Does it mean that actual range is second row (since it includes all other ranges)? Why input contains these overlapping inputs?

vkuznet commented 2 years ago

Alan, upon further reading the input of listChildParent correspond to [childID, parentID] pairs, that is clear for me now.

According to DBS ORALCE DB we have the following file ids for provided block

SQL> SELECT DISTINCT file_id
from cms_dbs3_k8s_global_owner.FILES f
INNER JOIN cms_dbs3_k8s_global_owner.BLOCKS b on f.block_id=b.block_id
WHERE b.block_name='/VBF_HHTo2G2Tau_CV_1_C2V_1_C3_0_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL16MiniAODAPVv2-HIG_RunIISummer20UL16MiniAODAPVv2_02457_0_SC_6Steps_PU_May2022_Val_Alanv1-v11/MINIAODSIM#25f5a41e-00e8-4148-9f62-36e2125cef7f' ;

   FILE_ID
----------
 661281877
 661281917
 661281957

These set is not identical to what was supplied in your API call and therefore DBS server fails by providing you the following reason: DBSError Code:118 Description:DBS invalid parameter for the DBS API Function:dbs.fileparents.InsertFileParentsBlockTxt Message:not all files present in block. This happens in this block of code: https://github.com/dmwm/dbs2go/blob/master/dbs/fileparents.go#L345-L350

In dbs logs I see exactly this:

[2022-05-07 13:38:41.821837589 +0000 UTC m=+754317.962041813] fileparents.go:346: block fids != file ids
[2022-05-07 13:38:41.821991131 +0000 UTC m=+754317.962195342] fileparents.go:347: block ids [661281877 661281917 661281957]
[2022-05-07 13:38:41.822190579 +0000 UTC m=+754317.962394788] fileparents.go:348: file  ids [661281877 661281877 661281877 661281877 661281917 661281917 661281917 661281917 661281957 661281957 661281957 661281957]
[2022-05-07 13:38:41.822394377 +0000 UTC m=+754317.962598588] fileparents.go:252: unable to insert file parents DBSError Code:118 Description:DBS invalid parameter for the DBS API Function:dbs.fileparents.InsertFileParentsBlockTxt Message:not all files present in block Error: record error

Therefore, I need to understand if logic of checking block file IDs and supplied files IDs is relevant.

vkuznet commented 2 years ago

@amaltaro , after consulting with DBS python code I identified the issue which was in usage of set's rather then lists and proper comparison of sets. I updated DBS3Writer code to new version and updated it on testbed. After that I run your example and no longer see the error.

Could you please proceed with your testing and insert another data (obviously the data listed in this ticket I already used in my tests) and we need to insert another block. Please report in this ticket your findings.

amaltaro commented 2 years ago

Alan, upon further reading the input of listChildParent correspond to [childID, parentID] pairs, that is clear for me now.

Yes, that's correct.

Thanks for providing this fix Valentin. I have just successfully updated the parentage information for another set of datasets.