Open amaltaro opened 7 months ago
@todor-ivanov as discussed in the meeting today - and right now with Andrea as well - let us put this back to ToDo and come back to this beginning of October (2 weeks more should not hurt us here).
Following discussion in mattermost wm-ops thread with @amaltaro.
Related to failure in inserting data to DBS, the current T0 production agent is struggling with inserting files into the blocks. I see the following error message in the DBS3Upload component log
Error trying to process block /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7 through DBS. Details: DBSError code: 110, message: d93d36f53eaf3097db5c9f50851359041c418a18727e6f363e6c18c37d3f25bb una
ble to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunk
s Message: Error: concurrency error
This is present for the following blocks:
I suggest that you review https://github.com/dmwm/WMCore/issues/11106 which describes the actual issue with concurrent data insertion. In short, to make it work we must have all pieces (like dataset configuration, etc.) in place to make concurrent injection. To solve this problem someone must inject first one block with all necessary information, and then can safely use concurrent pattern to inject other blocks.
@vkuznet thank you for jumping into this discussion.
I had a feeling that there was another obscure problem with DBS Server, and reviewing the ticket you pointed to (11106) - and according to your sentence above - I understand that, provided that we have at least 1 block injected into DBS for a given dataset, the "concurrency error" should no longer happen, given that all the foundation information is already in the database. Correct?
I picked one of the blocks provided by Antonio and queried DBS Server for its blocks: https://cmsweb.cern.ch/dbs/prod/global/DBSReader/blocks?dataset=/AlCaP0/Run2024H-v1/RAW
as you can see, this dataset already has a bunch of blocks in the database. So, how come we are having a "concurrency error" here?
If you'll inspect the code [1], in order to insert DBS block concurrently we need to have in place:
So, if all of these information is present and it is consistent across all blocks in DBS then answer is yes the concurrency error (based on database content) should not arise. In other words DBS server first acquire or insert this info into DBS tables and if two or more HTTP calls arrives at the same time it can cause database error which lead to concurrency error form DBS server. Is it the case of the discussed blocks I don't know. But it is possible to not have all the information present in DB across all blocks if any of the above have differ among them.
You may look at example of bulkblocks JSON [2] to see actually how this information is structured and provided to DBS. In particular, the information in dataset_conf_list
and file_conf_list
is used to look-up aforementioned info, along with primds
, processing_era
, etc. So, if you inject multiple JSON they need to have identical info for those attributes, otherwise you may potentially get into racing conditions described in https://github.com/dmwm/WMCore/issues/11106
[1] https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks2.go#L478 [2] https://github.com/dmwm/dbs2go/blob/master/test/bulkblocks.json
Valentin, unless there is a bug in the (T0)WMAgent, all the blocks for the same dataset should carry exactly the same metadata. That means, same acquisition era, primary dataset, etc etc etc.
Having said that, if a block exists in DBS Server, we can conclude that all of its metadata is already available as well. IF that metadata is already available and we are trying to inject more blocks for the same dataset, hence the same meta-data, there should be NO concurrency error.
Based on your explanation and on the data shared by Antonio, I fail to see how we would hit a "concurrency error". That means there is more to what we have discussed/understood so far; or the error message is misleading...
In any case, I would suggest to have @todor-ivanov following this up next week, comparing things with the DBS Server logs and against the source code.
I further looked into the dbs code and I think I identified the issue. According to the dbs code
insertFilesChunk
function, see https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks2.go#L1018Then, I looked at one of the dbs logs and found
[2024-09-24 00:45:17.228980109 +0000 UTC m=+2471302.202098481] fail to insert files chunks, trec &{IsFileValid:1 DatasetID:15071289 BlockID:37951592 CreationDate:1727138717 CreateBy:/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=cmst0/CN=658085/CN=Robot: CMS Tier0 FilesMap:{mu:{state:0 sema:0} read:{_:[] _:{} v:<nil>} dirty:map[] misses:0} NErrors:2}
So, indeed input file record DOES NOT contain required file type attribute, see File
structure over here https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks.go#L65. The "file_type" must be present in provided JSON, otherwise it will be assigned to default value 0 which is what file injection tries to get from database and it should be non-zero value.
To summarize, I suggest to check JSON records T0 provides and ensure it provides "file_type"
along other file attributes (all of them are defiend in this struct: https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks.go#L65). Without it DBS code correctly fails, but probably it would be useful to adjust error message to properly report the error.
For the record, here is how DBS error look in a log:
[2024-09-24 00:45:17.228980109 +0000 UTC m=+2471302.202098481] fail to insert files chunks, trec &{IsFileValid:1 DatasetID:15071289 BlockID:37951592 CreationDate:1727138717 CreateBy:/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=cmst0/CN=658085/CN=Robot: CMS Tier0 FilesMap:{mu:{state:0 sema:0} read:{_:[] _:{} v:<nil>} dirty:map[] misses:0} NErrors:2}
[2024-09-24 00:45:17.229561212 +0000 UTC m=+2471302.202679583] 5ecdc2bdcd03492fd64efc269de332cdcf1c8a53c3e3cc07168b0c741f0270ba unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
[2024-09-24 00:45:17.232415539 +0000 UTC m=+2471302.205533911] DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:5ecdc2bdcd03492fd64efc269de332cdcf1c8a53c3e3cc07168b0c741f0270ba unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error Error: nested DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error Stacktrace:
goroutine 300475111 [running]:
github.com/dmwm/dbs2go/dbs.Error({0xb054e0?, 0xc0009f2410?}, 0x6e, {0xc0004f60f0, 0xe6}, {0xa3b23e, 0x2b})
/go/src/github.com/vkuznet/dbs2go/dbs/errors.go:185 +0x99
github.com/dmwm/dbs2go/dbs.(*API).InsertBulkBlocksConcurrently(0xc00036c000)
/go/src/github.com/vkuznet/dbs2go/dbs/bulkblocks2.go:743 +0x2546
github.com/dmwm/dbs2go/web.DBSPostHandler({0xb08290, 0xc000012cd8}, 0xc000616700, {0xa1d753, 0xa})
/go/src/github.com/vkuznet/dbs2go/web/handlers.go:544 +0x1374
github.com/dmwm/dbs2go/web.BulkBlocksHandler({0xb08290?, 0xc000012cd8?}, 0xc000a9f460?)
/go/src/github.com/vkuznet/dbs2go/web/handlers.go:960 +0x3b
net/http.HandlerFunc.ServeHTTP(0xc00055f1a0?, {0xb08290?, 0xc000012cd8?}, 0x95d5a0?)
/usr/local/go/src/net/http/server.go:2136 +0x29
github.com/dmwm/dbs2go/web.limitMiddleware.func1({0xb08290?, 0xc000012cd8?}, 0xc00055f1a0?)
/go/src/github.com/vkuznet/dbs2go/web/middlewares.go:110 +0x32
net/http.HandlerFunc.ServeHTTP(0x7f8c001964c0?, {0xb08290?, 0xc000012cd8?}, 0xc0003
So, you have all pointers to look which lines of code fails by inspecting its stack, and that exactly what I did.
As far as I can tell, it should always be set like:
"file_type": "EDM",
@LinaresToine can you please change the component configuration and provide one of the block names that is failing to be inserted, in the following line:
config.DBS3Upload.dumpBlockJsonFor = ""
then restart DBS3Upload
and you should soon get a JSON dump of the content that the component is POSTing to the DBS Server. Output file should be under the component directory (e.g. install/DBS3Upload/).
Ok, I changed the config as suggested. Waiting on the loadFiles method to complete the cycle. Ill follow up
I have placed the output json file in /eos/home-c/cmst0/public/dbsError/dbsuploader_block.json
.
Another error is showing up in the DBS3Upload component for all 4 pending blocks:
Hit a general exception while inserting block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd. Error: (52, 'Empty reply from server')
Traceback (most recent call last):
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/WMComponent/DBS3Buffer/DBSUploadPoller.py", line 94, in uploadWorker
dbsApi.insertBulkBlock(blockDump=block)
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 647, in insertBulkBlock
result = self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 474, in __callServer
self.http_response = method_func(self.url, method, params, data, request_headers)
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RestApi.py", line 42, in post
return http_request(self._curl)
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 56, in __call__
curl_object.perform()
pycurl.error: (52, 'Empty reply from server')
An update from T0: Here is a JSON dump for a succesfully uploaded T0 DBS block:
/eos/home-c/cmst0/public/dbsError/dbsuploader_successful_block.json
Now we have a total of 276
blocks that we are unable to upload. We the same error message for all of them. A list of these blocks can be found here:
/eos/home-c/cmst0/public/dbsError/failingBlocks.txt
Because of these, we have 121384
files in T0 that we have been unable to register in DBS. @todor-ivanov is trying to find a way for us to upload this information.
Here is the follow up on what is the status of those blocks according to DBS. I had to create a script to go and query directly the DBS database lfn by lfn for all those blocks and here is the accumulated result:
blockDBSRecords.json: /eos/home-c/cmst0/public/dbsError/blockDBSRecords.json
So from what I can see from those results we can identify at least 3 different use cases:
I am going to filter out those for which we know are there. On top of that I consider checking their Rucio status as well. FYI @germanfgv
p.s. Here: DBSBlocksCheck.py is the script I used for accumulating those results
p.s. Here: And here: blockDBSRecords.json is an updated version of the DBS records with updated Rucio information per block as well
And continuing to reduce the results to something more readable here [1] is the final list of the block and file status at DBS for all of them.
As one can see:
/AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4
/AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd
meaning, all files from this block are recorded as part of a different block (could be due to an attempt to reprocess the same block twice./Muon0/Run2024H-v1/RAW#7369ccdf-3d3a-4d32-bad9-b04b02f279d4
/AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7
meaning, not only that the files already uploaded to DBS belong to a different block, but those which were already puloded were not the whole blockFYI: @germanfgv @LinaresToine
[1]
blockName: /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4:
blockDBSStatus: ['MISSING']
filesDBSStatus: ['BLOCKMISMATCH']
blockName: /Muon0/Run2024H-v1/RAW#7369ccdf-3d3a-4d32-bad9-b04b02f279d4:
blockDBSStatus: ['MISSING']
filesDBSStatus: ['MISSING', 'BLOCKMISMATCH']
blockName: /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd:
blockDBSStatus: ['MISSING']
filesDBSStatus: ['BLOCKMISMATCH']
blockName: /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7:
blockDBSStatus: ['MISSING']
filesDBSStatus: ['MISSING', 'BLOCKMISMATCH']
blockName: /Commissioning/Run2024I-v1/RAW#4113bc20-c92b-43a3-a767-06bccfe4af56: OK
blockName: /ParkingSingleMuon6/Run2024I-v1/RAW#75ce0f9d-2153-4520-b79f-bb0df5f19227: OK
blockName: /ZeroBias/Run2024I-v1/RAW#50e30425-0a03-46c5-8da0-26719c266dbc: OK
blockName: /ParkingSingleMuon1/Run2024I-v1/RAW#fae36d0d-cc36-4c5e-bfde-009aa38f9b7c: OK
blockName: /ParkingSingleMuon2/Run2024I-v1/RAW#3c125d76-99c4-4f65-8141-0ae9abcd0e1a: OK
blockName: /ParkingSingleMuon7/Run2024I-v1/RAW#0a25242b-c6fb-4eca-8382-826f4e878021: OK
blockName: /ParkingSingleMuon5/Run2024I-v1/RAW#b01693d9-d3d9-4ab2-8760-0b019652f89e: OK
blockName: /ParkingSingleMuon8/Run2024I-v1/RAW#a57df37e-4d9d-465e-93b2-54d40f892429: OK
blockName: /ParkingSingleMuon10/Run2024I-v1/RAW#627f4dd1-4c31-4f4b-bb85-48d19996ba4f: OK
blockName: /ParkingSingleMuon9/Run2024I-v1/RAW#1b4bc705-f6de-4f55-9c8b-7cd490457341: OK
blockName: /Tau/Run2024I-v1/RAW#2da0c3fb-4b3c-47a2-ac49-99cca62c226d: OK
blockName: /BTagMu/Run2024I-v1/RAW#280bd893-a382-496c-8e6f-366309493acc: OK
blockName: /ParkingDoubleMuonLowMass1/Run2024I-v1/RAW#b4492da6-fa05-4484-a034-aa2a87354735: OK
blockName: /AlCaLowPtJet/Run2024I-v1/RAW#0002be22-10d9-4200-9845-bf112ec9291a: OK
blockName: /Muon1/Run2024I-v1/RAW#499938c6-8357-4095-99db-91c90e600f0e: OK
blockName: /ParkingDoubleMuonLowMass4/Run2024I-PromptReco-v1/AOD#09d3e003-1ca7-459f-8089-1f1d95f5ba20: OK
blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/DQMIO#777a6f7a-058d-46a4-bfb9-d905b141fbd2: OK
blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/AOD#6fbca7c8-560f-4805-af21-55d424e9877a: OK
blockName: /Muon0/Run2024I-MuAlCalIsolatedMu-PromptReco-v1/ALCARECO#06571035-b278-4998-a9d3-2b523bb4fd0e: OK
blockName: /Muon1/Run2024I-HcalCalHO-PromptReco-v1/ALCARECO#65cfc43b-aca0-4600-8bb3-db4261856f3b: OK
blockName: /Tau/Run2024I-LogError-PromptReco-v1/RAW-RECO#645686dd-1135-4866-9e53-6438aa17600d: OK
blockName: /Muon1/Run2024I-PromptReco-v1/NANOAOD#d26b31f5-0371-4fe2-9420-e87a87925fdd: OK
blockName: /ParkingSingleMuon8/Run2024I-PromptReco-v1/MINIAOD#86c82707-6992-4126-b86f-182c5f5aa7fc: OK
blockName: /Tau/Run2024I-PromptReco-v1/AOD#6627bc34-c746-49a9-ab02-550710731e1b: OK
blockName: /Muon0/Run2024I-PromptReco-v1/MINIAOD#ebab670f-0588-422e-8206-c406f948bb06: OK
blockName: /Muon0/Run2024I-PromptReco-v1/DQMIO#35e3beac-bd5d-4ba4-82c0-a5372e89e5a6: OK
blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/DQMIO#2e38060a-bf22-4135-897d-f7d93684dede: OK
blockName: /DisplacedJet/Run2024I-EXOLLPJetHCAL-PromptReco-v1/AOD#ab2bd1a4-6f42-4d9a-853f-1a7a5aa5f2f4: OK
blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/MINIAOD#0821e60b-12b6-4993-8a67-0952379c34bb: OK
blockName: /Muon1/Run2024I-EXOCSCCluster-PromptReco-v1/USER#5197c2ba-2f13-48e9-bddd-9c1fd071cd33: OK
blockName: /ParkingVBF6/Run2024I-PromptReco-v1/NANOAOD#a36dad4c-dac8-4a43-bde9-5ac38a0f8b7d: OK
blockName: /EphemeralZeroBias1/Run2024I-PromptReco-v1/MINIAOD#0c7b63de-5a48-4d98-8e9b-9c52c714703a: OK
blockName: /JetMET1/Run2024I-PromptReco-v1/DQMIO#6d3cc1c5-48c8-4f9e-9f8b-1cb5b481d550: OK
blockName: /Tau/Run2024I-PromptReco-v1/NANOAOD#473eddda-612e-4306-91bf-9dfcb3b5d108: OK
blockName: /ParkingVBF6/Run2024I-PromptReco-v1/MINIAOD#38f7630c-e015-4a97-a331-e33b0cfa3604: OK
blockName: /ParkingSingleMuon6/Run2024I-PromptReco-v1/AOD#bd0eeb38-b1d0-4157-ade6-fb3b65f57995: OK
blockName: /ParkingVBF1/Run2024I-PromptReco-v1/AOD#1298b211-43f2-49c6-8788-2bde6e2a9e62: OK
blockName: /ParkingVBF0/Run2024I-PromptReco-v1/MINIAOD#63e59425-3ff1-406c-ae18-48bf9f239354: OK
blockName: /ParkingSingleMuon8/Run2024I-PromptReco-v1/AOD#85d58a9d-29b0-4f98-99bc-9c201ed2c6a2: OK
blockName: /Tau/Run2024I-EXODisappTrk-PromptReco-v1/USER#1a96c708-64a9-4f62-819f-a19633154b16: OK
blockName: /SpecialZeroBias5/Run2024I-PromptReco-v1/AOD#59e655a4-2897-47d0-ba11-287332c4e6b5: OK
blockName: /ParkingVBF1/Run2024I-PromptReco-v1/NANOAOD#aa47f11c-7080-492b-ab91-ad19e6299fff: OK
blockName: /ParkingVBF3/Run2024I-PromptReco-v1/NANOAOD#4330d839-9985-4b36-9d5e-b5aa5c19175f: OK
blockName: /ParkingHH/Run2024I-PromptReco-v1/MINIAOD#e8d162fa-f391-439c-a7f1-8a8d39dda120: OK
blockName: /Tau/Run2024I-PromptReco-v1/DQMIO#1a0ac20a-1d60-4d89-8133-e8559f1e4c13: OK
blockName: /ParkingSingleMuon0/Run2024I-PromptReco-v1/MINIAOD#d8995d51-e005-4757-8439-850c005cbd57: OK
blockName: /ParkingVBF5/Run2024I-PromptReco-v1/MINIAOD#5bff218a-4895-4f13-8148-c9e0bcf820b7: OK
blockName: /Muon0/Run2024I-PromptReco-v1/AOD#306f5950-5eec-43d3-96f2-8dfbe22d322c: OK
blockName: /EGamma0/Run2024I-PromptReco-v1/AOD#e9814b10-2545-4a83-8a3d-2501f5679ecd: OK
blockName: /JetMET1/Run2024I-PromptReco-v1/NANOAOD#039f5f67-3f70-4797-9b2a-c6d698e52efd: OK
blockName: /ParkingVBF1/Run2024I-PromptReco-v1/DQMIO#b1f45558-5e9a-493c-afed-7e133bb4a7e7: OK
blockName: /DisplacedJet/Run2024I-PromptReco-v1/AOD#0edffee0-8286-4658-943c-8efc45f23ea4: OK
blockName: /Muon1/Run2024I-PromptReco-v1/DQMIO#edae199f-fb22-48d3-9a8a-cdb15703bcbe: OK
blockName: /DisplacedJet/Run2024I-EXODelayedJet-PromptReco-v1/AOD#b46c0dcb-c26a-47c2-a4b0-fcac9b9d63be: OK
blockName: /Muon0/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#6b39e513-27ac-4e54-ad1e-a343b9d064fc: OK
blockName: /JetMET1/Run2024I-PromptReco-v1/AOD#fa6562fe-0d1d-4d06-9bf0-a135edbcf172: OK
blockName: /ParkingVBF0/Run2024I-PromptReco-v1/DQMIO#24686495-80bc-44de-a3b2-f39cfa971760: OK
blockName: /NoBPTX/Run2024I-PromptReco-v1/AOD#99fd2849-5b43-4927-9596-6e6a33683d9c: OK
blockName: /HLTPhysics/Run2024I-LogError-PromptReco-v1/RAW-RECO#8abbaf67-41c7-452f-816a-f978dd14cc1b: OK
blockName: /EGamma0/Run2024I-LogError-PromptReco-v1/RAW-RECO#ab07e175-a29f-4203-a3f3-dceb2938ae33: OK
blockName: /JetMET1/Run2024I-EXODisappTrk-PromptReco-v1/USER#98870a35-c0d8-4ece-9731-1ac081143000: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#3b08c77d-8e97-4aca-be54-f95b7ab76465: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#2b3eefa5-923b-4b42-9c5c-cf162453d59b: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#1d77a290-571c-442d-be95-531e4168e94d: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#23d1c315-25d0-47a8-813e-caa7a5f2a0f1: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#80570ad8-4b6a-4e00-bcb5-63ff743504d5: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#5abdb7b2-4a4a-4a10-a4b3-9cfae99bdf83: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#adc72e9a-7410-493a-a327-1611b18a4106: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#e0c75028-64eb-480f-abb6-910505a92973: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#c1bc74d9-2c3b-415d-928f-7ec8395868ad: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#f1e8e065-2a65-4bd7-9279-d88c423c0ea0: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#8147c17b-4550-4a14-9747-ca696aa03408: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#4645a8b5-b1ef-4008-b1f5-dff3fadb1855: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a8e53c28-c9ad-40fa-88bf-b7c2f3e61a64: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#19e36aa2-2ec8-4974-891f-112279ec9393: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#ba34c69d-db70-455e-81cc-13a161727e80: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#b66d6bae-2647-44b9-8bad-320da54d0a29: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#33b16287-bc8d-421b-bd38-2059ad19dd87: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a1f20e02-b988-4e3a-bd6f-68610bde0b97: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#d3c7711f-b7ba-4e4d-9db8-999cd6383551: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a77f0bd9-bd98-418e-b39c-9bf859203fad: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#aff24402-3ead-4ab6-9d71-02b13721b7cf: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#aea1ab62-ffa2-447f-bede-dbd01a05708a: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#ebb35256-bdc6-40a7-ae6c-9de27a2094bf: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#36740b3b-31a6-4be6-a4e4-f76f5e1200ab: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a95882ee-05b6-482a-bbf8-6f7ff8ab4354: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#0358032c-2997-440c-a658-461e011e87a0: OK
blockName: /Cosmics/Run2024I-MuAlGlobalCosmics-PromptReco-v1/ALCARECO#f7f08dfb-6c23-441c-9137-09abad0a7d39: OK
blockName: /ParkingDoubleMuonLowMass2/Run2024I-PromptReco-v1/DQMIO#e9d71494-b460-4680-a3b9-7a1c62fc4d01: OK
blockName: /ParkingHH/Run2024I-PromptReco-v1/AOD#ba026985-4cfd-4a06-ba1b-bfb5af6cbb64: OK
blockName: /MinimumBias/Run2024I-PromptReco-v1/NANOAOD#dfbf01f5-c43c-463f-b562-aee5a91da41e: OK
blockName: /ParkingSingleMuon2/Run2024I-PromptReco-v1/AOD#f2995e54-af6b-456c-8b40-abb844b299a2: OK
blockName: /EGamma0/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#0ad19bb8-d4fa-4565-9019-91ef6e7207ac: OK
blockName: /EGamma1/Run2024I-EXODisappTrk-PromptReco-v1/USER#cf89035f-8d3d-4a95-9dd4-ae73b92cb865: OK
blockName: /MinimumBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#d4fdf46b-db01-4df9-9eac-e23672e14f84: OK
blockName: /Muon0/Run2024I-EXODisappTrk-PromptReco-v1/USER#ae7335f9-c7ae-42bd-8304-0805347446dd: OK
blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/NANOAOD#0f9d8bfa-a7f7-44f9-8bec-e509f8334490: OK
blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/DQMIO#a004890b-d6cf-4ff0-b715-f1ba374e3d97: OK
blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/AOD#64b94383-cbaf-48a5-b194-3d15baa01adc: OK
blockName: /Tau/Run2024I-PromptReco-v1/MINIAOD#23bc1b6d-3f75-4007-8618-52755f3fb1f3: OK
blockName: /ParkingDoubleMuonLowMass3/Run2024I-PromptReco-v1/NANOAOD#3935aabb-be36-4e9d-a49f-afc523994fd5: OK
blockName: /MinimumBias/Run2024I-SiStripCalZeroBias-PromptReco-v1/ALCARECO#d621c61d-52ea-4d11-b092-4068bfd61ddf: OK
blockName: /EphemeralZeroBias7/Run2024I-PromptReco-v1/MINIAOD#f770ca99-fca5-4054-a377-9365c016069b: OK
blockName: /SpecialZeroBias1/Run2024I-PromptReco-v1/MINIAOD#b8cc0194-13dd-4091-96fa-28f5be5c2134: OK
blockName: /ParkingDoubleMuonLowMass4/Run2024I-TkAlJpsiMuMu-PromptReco-v1/ALCARECO#950e3d2c-5375-47c5-8f79-03df611b9422: OK
blockName: /Commissioning/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#0a2b5651-fe2e-436e-af4d-488cb00acf68: OK
blockName: /ScoutingPFMonitor/Run2024I-PromptReco-v1/NANOAOD#c2e3fd58-1bf5-4769-9230-6c6ac11bf75f: OK
blockName: /SpecialZeroBias5/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#3ecd813c-ad88-4ced-acba-d78a9ebc9963: OK
blockName: /SpecialZeroBias5/Run2024I-SiStripCalZeroBias-PromptReco-v1/ALCARECO#31c2c844-9601-4881-ba21-e999b89d7900: OK
blockName: /JetMET1/Run2024I-HcalCalIsoTrkProducerFilter-PromptReco-v1/ALCARECO#0f1c47f1-3755-4a00-8d36-2cd47e42605c: OK
blockName: /ParkingHH/Run2024I-PromptReco-v1/DQMIO#b3983892-1ab2-4d9b-a5da-c7e238846e1f: OK
blockName: /SpecialZeroBias5/Run2024I-LogErrorMonitor-PromptReco-v1/USER#6ea22759-354e-49ba-a72b-2d29034979e2: OK
blockName: /ParkingVBF6/Run2024I-PromptReco-v1/DQMIO#332ab512-bdef-44e0-a091-9615bdd417c6: OK
blockName: /Muon0/Run2024I-TkAlZMuMu-PromptReco-v1/ALCARECO#6d4fa60b-d7c3-4280-ab08-c48d7cbf258d: OK
blockName: /EphemeralZeroBias3/Run2024I-PromptReco-v1/NANOAOD#fde6778b-8361-427a-876f-e16b2e65978f: OK
blockName: /TestEnablesEcalHcal/Run2024I-Express-v1/RAW#6a31d991-d964-4fca-9113-59d1c40d5759: OK
blockName: /StreamExpressCosmics/Run2024I-SiPixelCalZeroBias-Express-v1/ALCARECO#13f46438-f23d-4121-85fb-896d224db127: OK
blockName: /StreamExpressCosmics/Run2024I-SiStripCalCosmics-Express-v1/ALCARECO#9d90aa84-6f90-4ee3-8e33-a2718e9e59b2: OK
blockName: /StreamALCAPPSExpress/Run2024I-PromptCalibProdPPSAlignment-Express-v1/ALCAPROMPT#939fa7d4-6ffa-4788-9b12-83436c0413b5: OK
blockName: /StreamExpress/Run2024I-TkAlZMuMu-Express-v1/ALCARECO#6097a868-c5bd-43ae-a804-1e881fcf5bc4: OK
blockName: /StreamExpress/Run2024I-SiPixelCalSingleMuonTight-Express-v1/ALCARECO#1b495798-f27a-49e4-8a21-87d8d2a236f6: OK
blockName: /StreamExpress/Run2024I-PromptCalibProdSiPixelAliHGComb-Express-v1/ALCAPROMPT#aafb7697-c790-4d5e-9dac-4cf5fae1c4ce: OK
blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/AOD#f35b9721-1e35-434b-9e06-28b6c88f64fe: OK
blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/AOD#90186a44-9236-4665-b48d-a8dd37ef0ff1: OK
blockName: /Cosmics/Run2024I-CosmicTP-PromptReco-v1/RAW-RECO#7c8c249d-9515-4d08-8217-418a269b1a2e: OK
blockName: /ParkingSingleMuon1/Run2024I-PromptReco-v1/MINIAOD#51342b65-75e2-42eb-b868-4df8ee7809b8: OK
blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/AOD#e8fd5b7b-24c4-49a1-8c26-7d7c2d37661a: OK
blockName: /ParkingVBF5/Run2024I-PromptReco-v1/AOD#db278bd7-459c-4427-9c2f-b214640caaeb: OK
blockName: /SpecialZeroBias1/Run2024I-PromptReco-v1/AOD#786cf920-b104-4e8f-bebb-30a30d090357: OK
blockName: /EGamma0/Run2024I-EcalUncalWElectron-PromptReco-v1/ALCARECO#fd409f01-fba4-4d87-a1c5-04ccde5ee8ad: OK
blockName: /Muon0/Run2024I-SiPixelCalSingleMuonLoose-PromptReco-v1/ALCARECO#5b6152f5-616f-4f55-ab65-eb8b1de0798b: OK
blockName: /EGamma0/Run2024I-EGMJME-PromptReco-v1/RAW-RECO#55a6487b-d4f9-4d01-82c7-0f2ee35872d1: OK
blockName: /EGamma1/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#67ac37a8-4d01-4a8b-a30f-4a714cfb2a0a: OK
blockName: /HLTPhysics/Run2024I-LogErrorMonitor-PromptReco-v1/USER#1b6693e3-d43b-47e9-ae23-fd327e5af74e: OK
blockName: /MuonShower/Run2024I-EXOCSCCluster-PromptReco-v1/USER#6d2d1137-e1d8-4ae0-bd81-065f8a050490: OK
blockName: /ParkingVBF1/Run2024I-PromptReco-v1/MINIAOD#12935b72-988e-44ca-9c7c-9b9d8063d8b3: OK
blockName: /Cosmics/Run2024I-LogError-PromptReco-v1/RAW-RECO#9b7ece4a-0f62-4b45-942a-e8366b905412: OK
blockName: /EGamma1/Run2024I-WElectron-PromptReco-v1/USER#e8b3efe3-339d-4b5c-82df-bc78defb09ea: OK
blockName: /NoBPTX/Run2024I-TkAlCosmicsInCollisions-PromptReco-v1/ALCARECO#c64e5941-7da4-47a8-ab32-284a5e059dca: OK
blockName: /Commissioning/Run2024I-LogError-PromptReco-v1/RAW-RECO#af0433ba-b843-4b44-bfa3-70b5d7475863: OK
blockName: /MinimumBias/Run2024I-PromptReco-v1/AOD#e45c0cff-39a4-440f-819a-2e522045618b: OK
blockName: /ParkingSingleMuon7/Run2024I-PromptReco-v1/NANOAOD#9162a0ab-23e1-4fd1-870e-f55da0661a44: OK
blockName: /Muon1/Run2024I-TkAlZMuMu-PromptReco-v1/ALCARECO#582d0da0-d896-46f3-b4de-e7001a1b4dea: OK
blockName: /EphemeralZeroBias3/Run2024I-PromptReco-v1/MINIAOD#aeaed114-ebc8-4ef4-a625-5af2c3559120: OK
blockName: /Commissioning/Run2024I-EcalActivity-PromptReco-v1/RAW-RECO#e65b7bb5-0732-432a-a181-4d9153161caa: OK
blockName: /ParkingVBF3/Run2024I-PromptReco-v1/DQMIO#fecdfbd5-6a50-4100-92f7-90b05c452570: OK
blockName: /ParkingVBF2/Run2024I-PromptReco-v1/DQMIO#43430e62-1bd6-4161-aa8d-3cf9b3da3e55: OK
blockName: /ParkingDoubleMuonLowMass4/Run2024I-PromptReco-v1/NANOAOD#ceeadfe9-03f7-471f-93fb-a7d4ae3ab806: OK
blockName: /EGamma1/Run2024I-LogErrorMonitor-PromptReco-v1/USER#2bd7cb69-577f-47a0-adaa-115b9df2e1b2: OK
blockName: /Commissioning/Run2024I-PromptReco-v1/NANOAOD#2936bb45-005c-4f28-9ee0-c24b8e8b647e: OK
blockName: /StreamExpress/Run2024I-TkAlMinBias-Express-v1/ALCARECO#c07f8345-698b-4a8d-87ac-39ef617874e0: OK
blockName: /StreamCalibration/Run2024I-EcalTestPulsesRaw-Express-v1/ALCARECO#b05bf432-e2d5-4e77-8bc3-d51be7fadb5c: OK
blockName: /StreamExpress/Run2024I-PromptCalibProdSiStripGains-Express-v1/ALCAPROMPT#82ba3ad1-7f7a-4a6b-ba44-e00632e53de5: OK
blockName: /ExpressPhysics/Run2024I-Express-v1/FEVT#12d498f7-0eb0-4c0b-a124-f971dfab8ec8: OK
blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/AOD#82f71a1c-0f5c-4a24-88e9-24d02619104d: OK
blockName: /Muon0/Run2024I-ZMu-PromptReco-v1/RAW-RECO#a0c2d025-fe28-40fc-b695-64e3465954d8: OK
blockName: /ParkingSingleMuon1/Run2024I-PromptReco-v1/AOD#2b73f5f0-a63f-402e-94bc-6231bc31d130: OK
blockName: /ParkingSingleMuon1/Run2024I-PromptReco-v1/AOD#45da3ad2-609d-4ed7-a77c-4a212635ea47: OK
blockName: /EGamma0/Run2024I-PromptReco-v1/DQMIO#5ba6d8d0-5af2-4e46-9028-f8a83a38bd22: OK
blockName: /ParkingDoubleMuonLowMass2/Run2024I-PromptReco-v1/NANOAOD#179594fb-d82a-4194-b8c7-ef8636ccade9: OK
blockName: /ZeroBias/Run2024I-HcalCalIsolatedBunchSelector-PromptReco-v1/ALCARECO#623d20fe-86d2-4650-9c32-76fa0c792d6b: OK
blockName: /HcalNZS/Run2024I-LogError-PromptReco-v1/RAW-RECO#ae1000be-4a3e-4b4d-b9dc-362c2028b5a9: OK
blockName: /EGamma0/Run2024I-EcalESAlign-PromptReco-v1/ALCARECO#982d6c67-32ba-4a10-9239-4f460bf1c002: OK
blockName: /ScoutingPFMonitor/Run2024I-PromptReco-v1/MINIAOD#37708d28-236f-4554-91d3-e3617b2c2a22: OK
blockName: /ParkingSingleMuon2/Run2024I-PromptReco-v1/MINIAOD#ee62eeb6-9a38-4a12-94b4-6e2e03c5bf6c: OK
blockName: /MuonShower/Run2024I-PromptReco-v1/MINIAOD#ab4cb22d-8acc-4661-b955-84786cd695db: OK
blockName: /ParkingSingleMuon0/Run2024I-PromptReco-v1/NANOAOD#b2b7410c-cae4-48d2-aa08-1a4473ca9fcb: OK
blockName: /EGamma1/Run2024I-EcalESAlign-PromptReco-v1/ALCARECO#7fb030a6-ce2b-40ba-9cfa-633e914c6ed4: OK
blockName: /ZeroBias/Run2024I-PromptReco-v1/DQMIO#59bf6b32-52ee-4b87-a5f9-0eaf70f4eb00: OK
blockName: /ParkingVBF4/Run2024I-PromptReco-v1/NANOAOD#023f6a88-3b2b-4826-ab9b-76a146d5c6a0: OK
blockName: /SpecialZeroBias1/Run2024I-LogError-PromptReco-v1/RAW-RECO#9d2492d8-6426-4ece-bdf3-2c91113f4286: OK
blockName: /MinimumBias/Run2024I-PromptReco-v1/MINIAOD#9029265d-b4db-458c-afc1-405d680f07da: OK
blockName: /SpecialZeroBias0/Run2024I-PromptReco-v1/AOD#9c8e602e-c2ba-49c4-9b96-4b2ff143d3a4: OK
blockName: /ParkingDoubleMuonLowMass1/Run2024I-TkAlUpsilonMuMu-PromptReco-v1/ALCARECO#6cdb8ade-6f9b-4b24-ab89-027326e095be: OK
blockName: /SpecialZeroBias5/Run2024I-LogError-PromptReco-v1/RAW-RECO#a1de2483-ce3e-425d-85d6-5fea33a6ea67: OK
blockName: /StreamExpressCosmics/Run2024I-Express-v1/DQMIO#699d704b-19aa-480c-8a8d-0fbebb0b0cd9: OK
blockName: /StreamExpressCosmics/Run2024I-PromptCalibProdSiStripLA-Express-v1/ALCAPROMPT#6420dcde-2fa4-45bd-9c32-d10682923317: OK
blockName: /StreamExpressCosmics/Run2024I-PromptCalibProdSiStrip-Express-v1/ALCAPROMPT#2e69c51a-1dca-4088-8aac-3743f810ce56: OK
blockName: /StreamALCAPPSExpress/Run2024I-PPSCalMaxTracks-Express-v1/ALCARECO#652af1be-cb9a-4a1c-bbf4-c88c1686d301: OK
blockName: /StreamExpress/Run2024I-PromptCalibProd-Express-v1/ALCAPROMPT#0e377dc2-065e-4e15-8566-bd910470baad: OK
blockName: /StreamExpress/Run2024I-SiPixelCalSingleMuon-Express-v1/ALCARECO#3a744512-b3a2-447d-85f5-d0ec6254af59: OK
blockName: /ZeroBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#35f2cee8-79c7-4cab-9d86-dc50c003d893: OK
blockName: /ZeroBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#56647210-b149-4fdc-800f-a3e9523b3ea3: OK
blockName: /HcalNZS/Run2024I-PromptReco-v1/DQMIO#045c4ec2-d9a4-44ce-9fd5-717415778bf5: OK
blockName: /SpecialZeroBias5/Run2024I-PromptReco-v1/NANOAOD#d533eae7-c987-4e63-9e0e-0edb3e0bd246: OK
blockName: /ParkingSingleMuon0/Run2024I-PromptReco-v1/AOD#a154328f-5edf-46bb-a395-d3d45a8b1ca6: OK
blockName: /EGamma1/Run2024I-ZElectron-PromptReco-v1/RAW-RECO#5f92c14b-c6c0-4b8e-8913-71a11b54f598: OK
blockName: /EGamma1/Run2024I-EXOMONOPOLE-PromptReco-v1/USER#6dcd1342-723d-4844-87b9-8248bf5db833: OK
blockName: /StreamExpress/Run2024I-PromptCalibProdSiPixel-Express-v1/ALCAPROMPT#c9543d48-4c8f-4936-bd43-3a63dd1c174f: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#ef15476a-bbad-4545-9f91-7e77de5d6034: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#0758fd50-931f-488d-8a1e-663e1c6174e4: OK
blockName: /SpecialZeroBias1/Run2024I-PromptReco-v1/DQMIO#1596222f-a243-4dc3-b9fe-a9ec03ad9adb: OK
blockName: /SpecialZeroBias2/Run2024I-SiStripCalZeroBias-PromptReco-v1/ALCARECO#a8b97986-e1e1-4d7a-9b0f-4ae31589c714: OK
blockName: /EGamma1/Run2024I-PromptReco-v1/MINIAOD#ee18201c-58ce-42a7-a60b-d1364ca32653: OK
blockName: /ParkingVBF5/Run2024I-PromptReco-v1/NANOAOD#36beaf6c-97b1-475b-aba4-e9e7e8e694c3: OK
blockName: /StreamExpressCosmics/Run2024I-PromptCalibProdSiPixelLAMCS-Express-v1/ALCAPROMPT#99955f16-752e-4dd0-ad52-d6eaf8d0f509: OK
blockName: /Muon1/Run2024I-SiPixelCalSingleMuonLoose-PromptReco-v1/ALCARECO#14d95293-a39a-4d2a-a1f1-956affdde47c: OK
blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/MINIAOD#4223ff94-a955-424e-b245-13c337e5b17d: OK
blockName: /Muon1/Run2024I-EXODisappMuon-PromptReco-v1/USER#55154291-7763-457d-a791-96c65ef849ea: OK
blockName: /Muon1/Run2024I-MUOJME-PromptReco-v1/RAW-RECO#3b55bfa9-ba65-4454-bc98-e60db1916b28: OK
blockName: /ParkingSingleMuon9/Run2024I-PromptReco-v1/NANOAOD#439d8642-eccc-4e50-85ec-ce0a084b7fee: OK
blockName: /Muon1/Run2024I-MuAlCalIsolatedMu-PromptReco-v1/ALCARECO#b9479dcc-06fb-4dce-8b11-d3cb0e957900: OK
blockName: /Muon1/Run2024I-TkAlMuonIsolated-PromptReco-v1/ALCARECO#56f3de3c-f985-466c-b405-36fb5ff57720: OK
blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/AOD#04da00bb-4cec-4b87-a1d9-0747a3d4f02e: OK
blockName: /Muon1/Run2024I-EXODisappTrk-PromptReco-v1/USER#f3e359cc-cb16-4020-8e84-4f083d1e9441: OK
blockName: /JetMET0/Run2024I-JetHTJetPlusHOFilter-PromptReco-v1/RAW-RECO#13eb60ae-a6e7-4a47-b856-0f4f4e6e00d8: OK
blockName: /Muon0/Run2024I-MUOJME-PromptReco-v1/RAW-RECO#a7967f2a-1035-4aea-afa0-26b9898938ce: OK
blockName: /Muon0/Run2024I-ZMu-PromptReco-v1/RAW-RECO#966a0503-91c2-4e38-a002-bfac712cb168: OK
blockName: /ZeroBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#685c6155-a526-4509-bce9-812330416777: OK
blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/MINIAOD#3b1b2f8c-a7da-4e53-902f-b4fa265774d2: OK
blockName: /Muon1/Run2024I-PromptReco-v1/AOD#4f5fab34-198f-4eb0-81a4-09fc0c1a501c: OK
blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/DQMIO#0fdb3015-38e1-47ef-bb71-237e7fbb1f08: OK
blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/MINIAOD#ab102ad0-6081-42b6-9afe-9df7746af1d9: OK
blockName: /Muon1/Run2024I-PromptReco-v1/MINIAOD#4adf338f-e98a-4e15-8e7f-a075cafbf918: OK
blockName: /Muon1/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#a794ba12-a0d8-4c78-9d07-9b977908cb1f: OK
blockName: /ParkingSingleMuon8/Run2024I-PromptReco-v1/NANOAOD#f594d1ee-148d-4b50-870b-a2f66c51efec: OK
blockName: /ParkingSingleMuon9/Run2024I-PromptReco-v1/AOD#c60db50a-6e2e-4054-82d0-5d2f2622dd85: OK
blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/MINIAOD#9fab21d5-31e2-426a-aa0d-5ae91119d468: OK
blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/MINIAOD#ca5f6304-70f8-4504-b885-75bb293adf69: OK
blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/NANOAOD#7e049c44-3e1b-4e9d-92aa-8aa5a70db114: OK
blockName: /Muon1/Run2024I-LogError-PromptReco-v1/RAW-RECO#65903046-1775-44ea-94bc-dfba5746ed0e: OK
blockName: /Muon1/Run2024I-ZMu-PromptReco-v1/RAW-RECO#1fb3d4ad-ca60-4e64-a077-1437accb0f57: OK
blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/NANOAOD#b83a6f99-b768-4588-8577-18436f17bd0c: OK
blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/AOD#41bdafe6-cdc1-40e1-9edc-85ef3d18e0ae: OK
blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/MINIAOD#b8ca91f4-608a-4612-8eeb-df6183a7b99f: OK
blockName: /ParkingSingleMuon6/Run2024I-PromptReco-v1/MINIAOD#9b909863-233e-480c-9e3f-3850c0b67167: OK
blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/AOD#b9849323-cc1a-445a-817f-c4e3212f74a2: OK
blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/AOD#c0047424-6a0a-41c6-94ac-28aa41129d71: OK
blockName: /ParkingVBF6/Run2024I-PromptReco-v1/AOD#d37b0146-4466-4ba8-a1ba-85c1ef4a902e: OK
blockName: /Muon1/Run2024I-LogErrorMonitor-PromptReco-v1/USER#63a0eea8-35a0-4b70-adb5-39a6141c7bde: OK
blockName: /ParkingVBF2/Run2024I-PromptReco-v1/NANOAOD#85077ea4-7bbd-45f7-a2df-2608128abe31: OK
blockName: /ParkingVBF0/Run2024I-PromptReco-v1/AOD#8738e434-abdc-4d03-a60d-358ab2412188: OK
blockName: /JetMET1/Run2024I-EXOSoftDisplacedVertices-PromptReco-v1/AOD#c2dd6617-dd8e-4bd8-9bd7-cdd81a858c36: OK
blockName: /ParkingSingleMuon9/Run2024I-PromptReco-v1/MINIAOD#78af0c2f-e670-46e5-832a-d3828066fca7: OK
blockName: /ParkingSingleMuon7/Run2024I-PromptReco-v1/AOD#f81d407a-8165-41c9-8a11-a5182d63d273: OK
blockName: /Muon0/Run2024I-LogErrorMonitor-PromptReco-v1/USER#d7c8e5cc-0e23-424f-b000-55aa41070d63: OK
blockName: /EphemeralZeroBias0/Run2024I-PromptReco-v1/MINIAOD#ad55216f-8224-404d-b8e9-daba73c85bb4: OK
blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/NANOAOD#f590565c-1256-4f74-8e00-db331266d599: OK
blockName: /Muon0/Run2024I-PromptReco-v1/NANOAOD#2ab3aa3d-a4ca-4773-8a58-85813617ea33: OK
blockName: /Muon1/Run2024I-HcalCalHBHEMuonProducerFilter-PromptReco-v1/ALCARECO#455e3ec5-62d7-4f68-abbe-a34111a87076: OK
blockName: /JetMET1/Run2024I-LogError-PromptReco-v1/RAW-RECO#1a77079a-29d9-4920-834c-eb523aeea080: OK
blockName: /Muon0/Run2024I-EXODisappMuon-PromptReco-v1/USER#474dc48d-caf5-44d3-9a80-dcc2d5eec561: OK
blockName: /JetMET1/Run2024I-JetHTJetPlusHOFilter-PromptReco-v1/RAW-RECO#f965d641-1a87-46b3-87a8-9d2a566fc604: OK
blockName: /ParkingSingleMuon6/Run2024I-PromptReco-v1/NANOAOD#33455ccc-7ce1-4fb5-aa69-2048f8362f27: OK
blockName: /ParkingSingleMuon10/Run2024I-PromptReco-v1/NANOAOD#fe088890-8bc3-416c-b263-408703b5efa4: OK
blockName: /ParkingSingleMuon10/Run2024I-PromptReco-v1/AOD#df90c976-3437-46ee-af3e-fbc2221e48f4: OK
blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/DQMIO#fe3a6889-bdb5-4959-a164-2db208d3e69b: OK
blockName: /ParkingSingleMuon10/Run2024I-PromptReco-v1/MINIAOD#71008299-7fb4-4245-90d2-e980b9e195a1: OK
blockName: /ParkingVBF2/Run2024I-PromptReco-v1/MINIAOD#7e7cb65b-c8db-4e36-b9a1-3475a031dedd: OK
blockName: /AlCaP0/Run2024I-v1/RAW#0d3bd409-bcd8-4c59-bcd8-d7ecc3a1222b: OK
blockName: /ParkingVBF3/Run2024I-PromptReco-v1/MINIAOD#2a6f15da-bd5c-4520-b70b-ab50ff65e04e: OK
blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/NANOAOD#f7f3e422-3bd6-4410-a955-9ed339b7219f: OK
blockName: /ParkingVBF2/Run2024I-PromptReco-v1/AOD#ebaafffd-4020-46cf-9137-37c2832d3eac: OK
blockName: /JetMET1/Run2024I-EXOMONOPOLE-PromptReco-v1/USER#77eabaa1-ead3-4534-a972-e788c3e7f050: OK
blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/NANOAOD#8e5664a8-e3d1-4f92-b27e-3707ca3dc4df: OK
blockName: /Muon0/Run2024I-EXOCSCCluster-PromptReco-v1/USER#700a9b24-dbc4-4bf9-8787-01da5ef26a06: OK
blockName: /Muon0/Run2024I-LogError-PromptReco-v1/RAW-RECO#b5af0d19-9710-4757-af04-ba6f63ab4070: OK
blockName: /ScoutingPFRun3/Run2024I-v1/HLTSCOUT#da913315-bb66-42ed-8c46-4f7f3714ef0c: OK
blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/AOD#3d339afb-6be9-440d-802e-3572ab355d56: OK
blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/NANOAOD#1c60e700-caf6-469a-9bb7-40122038ed33: OK
blockName: /JetMET1/Run2024I-PromptReco-v1/MINIAOD#af8b556a-6dfa-43eb-890a-7f0cdea01f87: OK
blockName: /JetMET1/Run2024I-EXOHighMET-PromptReco-v1/RAW-RECO#f09b2a46-61f1-4ae0-ac58-17d8c7db4fe3: OK
blockName: /ParkingVBF3/Run2024I-PromptReco-v1/AOD#8a47209b-3d7a-4c13-a3d7-8a34397a9f94: OK
blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#2d01f47e-5f60-4115-b99a-3ccd5f843a71: OK
blockName: /EphemeralZeroBias5/Run2024I-PromptReco-v1/MINIAOD#65830620-4305-4f9d-b084-555c17dc5610: OK
blockName: /ParkingHH/Run2024I-PromptReco-v1/AOD#47f79776-ae5b-41ac-9965-d4981b9790d6: OK
blockName: /ZeroBias/Run2024I-PromptReco-v1/AOD#d46e1e69-d0e3-4456-9bf1-060eb3731aec: OK
blockName: /ParkingVBF0/Run2024I-PromptReco-v1/NANOAOD#40dc6115-2b40-43be-9e4e-cdd658e68dc7: OK
blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/AOD#0410c654-f369-40f4-858b-af27bbe4d94d: OK
blockName: /JetMET0/Run2024I-PromptReco-v1/AOD#dc00eaf3-8f2c-4465-ba70-78335e4cb245: OK
blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/NANOAOD#2c9e2ac2-b2aa-4868-9782-556e6193cbbb: OK
blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/MINIAOD#3957f41c-be2a-4f09-b04c-6995d7c23eee: OK
@germanfgv @LinaresToine Could we check what is special for those 4 blocks reported as experiencing BLOCKMISMATCH
records at DBS in my previous comment: https://github.com/dmwm/WMCore/issues/11965#issuecomment-2454961698 . I am interested to find out at least:
@todor-ivanov as we discussed in the WMCore meeting, DBS3Upload should have a mechanism to identify blocks that have already been injected into DBS Server, but failed to acknowledge the operation for some reason.
If the component tries to inject a block already in the server, it is supposed to return exit code 128, marking the block as check
here:
https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L104
which will trigger the execution of this block of code: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L848
in the next cycle of the component.
I don't think anything changed on the DBS Server codebase lately, so I expect this feature to be still functional. But you might want to revise the error message/code that we are getting for the problematic blocks.
@todor-ivanov we are having similar problems with one agent that is ready to be shutdown (after draining), but it still has one block that it fails to inject into DBS Server.
Could you please look into submit12 and try to understand what the problem is with:
2024-11-04 21:38:50,968:140011552839424:INFO:DBSUploadPoller:About to call insert block for: /XToYYprimeTo4Q_MX-2000_MY-30_MYprime-600_narrow_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM#fb282932-f32b-48de-82e5-a56cceb34cad
2024-11-04 21:38:51,654:140011552839424:ERROR:DBSUploadPoller:Error trying to process block /XToYYprimeTo4Q_MX-2000_MY-30_MYprime-600_narrow_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM#fb282932-f32b-48de-82e5-a56cceb34cad through DBS. Details: DBSError code: 131, message: fb20d909d3a86926e3d8d0498c1ebfc3f4ad617c6b5e5dcaeecde3662af8797b unable to find dataset_id for /XToYYprimeTo4Q_MX-2000_MY-30_MYprime-600_narrow_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM, error DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set, reason: DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set
It seems to be failing injection since Oct 15th.
- Almost all of those blocks are properly present at DBS - so for those I assume that the Agent did not properly handled the initial return code by DBS and it simply continues to retry.
Thanks @todor-ivanov! tested this using my own script (DBSBlockCheck.py) and got the same result. All but 4 blocks are already available in DBS.
All the blocks listed in /eos/home-c/cmst0/public/dbsError/failingBlocks.txt
belong to the same agent, including the 4 problematic ones.
I can check @amaltaro's idea tomorrow.
In reality the error code the agent unwraps from the HTTP header for some reason is 52
instead of 128
see [1]. So thie mechanism mentioned here: https://github.com/dmwm/WMCore/issues/11965#issuecomment-2455173659 will never trigger.
[1]
2024-11-05 10:07:49,084:139753276044864:ERROR:DBSUploadPoller:Hit a general exception while inserting block /Tau/Run2024I-PromptReco-v1/DQMIO#1a0ac20a-1d60-4d89-8133-e8559f1e4c13. Error: (52, 'Empty reply from server')
Traceback (most recent call last):
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/WMComponent/DBS3Buffer/DBSUploadPoller.py", line 94, in uploadWorker
dbsApi.insertBulkBlock(blockDump=block)
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 647, in insertBulkBlock
result = self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 474, in __callServer
self.http_response = method_func(self.url, method, params, data, request_headers)
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RestApi.py", line 42, in post
return http_request(self._curl)
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 56, in __call__
curl_object.perform()
pycurl.error: (52, 'Empty reply from server')
Actually it never tries to read the HTTP header and to actually resolve the true DBS error, which is supposed to be done through the dbsError
class here: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L102
And the reason why it happens like that, is obviously, because the error returned by the pycurl
client is not of type HTTPError
. So this whole piece of code there is never tried: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L96-L115
But instead the exception is handled as a general exception and this one is taking the control: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L116-L119
It has to have something to do with this line from the traceback:
File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 56, in __call__
curl_object.perform()
And just to add to the observation: The 4 blocks which I mentioned are experiencing BLOCKMISMATCH
at DBS, behave differently. They fail with a proper DBS exception [1]. All 4 of them. And it is indeed the concuerrency error
- DBSError Code:110
. So for them the actual HTTP Header is indeed parsed and the true DBS Error encoded into it is received, so the exception is handled according to whatever logic is meant to be implemented by: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L96-L115
But as we can see DBS ErrorCode: 110
is not handled at this logic. So I suspect the conversation on how to proceed about these cases needs to continue once we understand what exactly has happened with those 4 blocks at the first place. It doesn't seem that a proper agreement has been achieved on the actions required on both - the client and the server side for situations like that.
[1]
2024-09-20 08:43:10,755:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd through DBS. Details: DBSError code: 110, message: 5ecdc2bdcd03492fd64efc269de332cdc
f1c8a53c3e3cc07168b0c741f0270ba unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:
DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
2024-09-20 08:43:10,756:139632874354240:INFO:DBSUploadPoller:About to call insert block for: /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7
2024-09-20 08:43:10,757:139632874354240:INFO:DBSUploadPoller:Queueing block for insertion: /L1ScoutingSelection/Run2024H-v1/L1SCOUT#694f9058-382e-47d9-89cd-646541261cd7
2024-09-20 08:43:10,760:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 through DBS. Details: DBSError code: 110, message: 997071d9311e283887ce5e57b0b180046
7986e1c57f620aff5a39d98b881fb6c unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:
DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
...
2024-09-20 08:43:10,799:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7 through DBS. Details: DBSError code: 110, message: d93d36f53eaf3097db5c9f50851359041c418a18727e6f363e6c18c37d3f25bb unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
...
2024-09-20 08:43:11,854:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /Muon0/Run2024H-v1/RAW#7369ccdf-3d3a-4d32-bad9-b04b02f279d4 through DBS. Details: DBSError code: 110, message: e38e86de6869760af39faf5da584eceee0b0b9d1de48e57276593df8dd4c720e unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
@todor-ivanov actually, at some point, the 4 blocks started failing with the pycurl.error: (52, 'Empty reply from server')
, before any other block had failed.
This is the last appearance of DBSError 110
:
2024-09-27 14:31:30,711:140408533284416:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd through DBS. Details: DBSError code: 110, message: ec6dab1b1b8d8ba3bf018be816846d73e007b5049b93
947a1e7472786c73ece6 unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record e
rror Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
This is the first appearance of pyCurl error 52
.
2024-09-27 15:03:00,192:140408533284416:ERROR:DBSUploadPoller:Hit a general exception while inserting block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd. Error: (52, 'Empty reply from server')
This might be a clue on what caused the other 272 blocks to fail. It seems something change in the DBS server at some point between 2024-09-27 14:31
and 2024-09-27 15:03
. After that moment, the client is unable to parse the server's error codes. This exactly coincides with the deployment of the APS-based CMSWEB cluster
@amaltaro @todor-ivanov @vkuznet
Here you have the DBS3Upload
ComponentLog, in case you want to check these dates:
/eos/user/c/cmst0/public/dbsError/ComponentLog
@germanfgv I would like to mention that according to k8s production dbs cluster we run DBS pods for 209 days. Therefore, nothing has changed on DBS side, and neither I aware of any development, commits/PRs. The concurrency error may seems misleading since it printed out with concurrency call to file injection. But the file injection fails due to missing aux meta-data in JSON payload. Please see these DBS code:
ConcurrencyErr
occur hereinsertFilesChunk
function returns errorinsertFilesChunk
code base you'll see that error occurs only in three occations:
FILE_DATA_TYPES
idI reported MANY times that most likely issue is with missing file data type in JSON payload, and I strongly suggest to start with your JSON payload and see if it is there. In particular, the files section of payload should contain file_type
, see example here.
If JSON payload is correct in terms of ALL required aux meta-data, I suggest that you move down the list and check validity of the file(s) and finally look-up for ORACLE insert error.
@vkuznet We have 2 separate problems here:
I bring up the APS upgrade in reference with the issues parsing the response from the server, not as an explanation for the concurrency issues. After 2024-09-27 15:03
, the DBS client is unable to distinguish DBSError 128: Block already exists
, from DBSError 110: Concurrency error
(Or any HTTP other error). They all show up as a pyCurl error 52
. As the timing coincides with the deployment of the APS server, it seems to me very likely the issue is related to that upgrade, specially since, as you mentioned, there have not been any other changes in the code.
I would like to switch this agent temporarily to the cmsweb-prod.cern.ch
version of DBSWriter
, simply to check if the 272 blocks without concurrency issues can move along. This will not create a bit pressure over the server, as this agent is no longer producing new data, and simply needs to upload those 272 blocks. @vkuznet do you have anything against that plan?
Regarding the JSON payload, the dumps we obtained from the agent show "file_type": "EDM",
as expected. This is why we've moved to check the validity of the files and Todor already found issues there. There are indeed files appearing in more than one block. Fixing this will be more complicated and we still need to understand what faulty agent logic caused it.
My suggestions would be the following:
curl -H "User-Agent: my-failed-dataset-block#123" ...
.52
error, otherwise@germanfgv , and regarding switching to cmsweb-prod
, if you interested to understand the error I suggest to use manual curl
approach as I described before. And, afterwards you may switch to cmsweb-prod
to see if you'll be able to inject them using Apache FE.
About the 4 original blocks with the concurrency errors, specifically the AlCaP0
blocks:
I see all files in DBS, but they are distributed among two "impostor" blocks.
1. `/AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417`
2. `/AlCaP0/Run2024H-v1/RAW#0392f25d-8397-40b3-8f6f-46266d92583b`
I call them impostor because both have blocks that belong to other blocks and number 2 is not even in Rucio and all his files belong to another block according to the database. Here is a summary of all 5 blocks; the 2 impostors and the 3 originals:
/AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
(impostor 1)
/AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4
/AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd
/AlCaP0/Run2024H-v1/RAW#0392f25d-8397-40b3-8f6f-46266d92583b
(impostor 2)
/AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7
/AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd
/AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4
(original)
/AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
/AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd
(original)
/AlCaP0/Run2024H-v1/RAW#0392f25d-8397-40b3-8f6f-46266d92583b
/AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
/AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7
(original)
/AlCaP0/Run2024H-v1/RAW#0392f25d-8397-40b3-8f6f-46266d92583b
About:
to do that, you should stop using DBS3Upload code as it hides many things and prevent from debugging the issue
Just to put @vkuznet's words in perspective:
I tried to completely simulate the whole agent environment in preprod
connected to DBS integration
, falsely assuming everything should go smoothly and upon initial successful upload of the block I'll be able to reproduce the duplication error on a second attempt. But:
DBSApi
is not able to complete it's execution.... the error originates from the dbsclient
and it throws the error from this very line: def insertBulkBlock(self, blockDump):
...
result = self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )
Which actually contains the true DBS Error in the header and one can spot the error message in the printout:
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set
-- So those are two nested DBS errors:
DBSError Code: 101
- The error from the wrapper API InsertBulkBlocksConcurrently
, reflecting that the call to the database actually failed.DBSError Code: 103
- And the bottom error giving the true reason, why the call to the database failed - and which in this case is because the Parentage
file of this lfn
: /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root
is indeed missing at this instance of DBS (which is completely expected), and the sql
query actually returned an empty result. And all this is properly raised by the dbsclient
. What happens at the WMAgents DBSApi
though is quite undesired. The error code is silently dropped and transformed only to the upper level HTTP 400
error. And the so carried actual error inside the header is simply ignored by this line: And there is a plethora of DBS server errors we do not handle: https://github.com/dmwm/dbs2go/blob/8effd5a6bcb1c5b169348e3ac886891ad3aa1a2a/dbs/errors.go#L37-L81 : [2]
FYI: @germanfgv @LinaresToine
[1]
5643 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(477)__callServer()
5644 │-> self.__parseForException(http_error) |
5645 │(Pdb) |
5646 │DBS Server error: [{'error': {'reason': 'DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set', 'message': 'unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8df\
a-1722884306c5.root', 'function': 'dbs.bulkblocks.InsertBulkBlocksConcurrently', 'code': 101, 'stacktrace': '\ngoroutine 7968287 [running]:\ngithub.com/dmwm/dbs2go/dbs.Error({0xb2ca20?, 0xc0006842d0?}, 0x65, {0xc000702000, 0x84}, {0xa5c044, 0x2b})\n\t/go/src/github.com/vkuznet/dbs2go/dbs/errors.go:185 +0x99\n\
github.com/dmwm/dbs2go/dbs.(*API).InsertBulkBlocksConcurrently(0xc000236070)\n\t/go/src/github.com/vkuznet/dbs2go/dbs/bulkblocks2.go:508 +0x605\ngithub.com/dmwm/dbs2go/web.DBSPostHandler({0xb2f790, 0xc000aa01e0}, 0xc000686c60, {0xa3e07d, 0xa})\n\t/go/src/github.com/vkuznet/dbs2go/web/handlers.go:562 +0x109e\n\
github.com/dmwm/dbs2go/web.BulkBlocksHandler({0xb2f790?, 0xc000aa01e0?}, 0xc000033f60?)\n\t/go/src/github.com/vkuznet/dbs2go/web/handlers.go:978 +0x3b\nnet/http.HandlerFunc.ServeHTTP(0x0?, {0xb2f790?, 0xc000aa01e0?}, 0x11?)\n\t/usr/local/go/src/net/http/server.go:2171 +0x29\ngithub.com/dmwm/dbs2go/web.limitMi\
ddleware.func1({0xb2f790?, 0xc000aa01e0?}, 0xc0006c6650?)\n\t/go/src/github.com/vkuznet/dbs2go/web/middlewares.go:110 +0x32\nnet/http.HandlerFunc.ServeHTTP(0xc0003c0f30?, {0xb2f790?, 0xc000aa01e0?}, 0xc0003af450?)\n\t/usr/loca'}, 'http': {'method': 'POST', 'code': 400, 'timestamp': '2024-11-06 16:16:23.350982\
889 +0000 UTC m=+5760929.544914892', 'path': '/dbs/int/global/DBSWriter/bulkblocks', 'user_agent': 'DBSClient/Unknown/', 'x_forwarded_host': 'cmsweb-testbed.cern.ch', 'x_forwarded_for': '188.184.96.94:20438, 188.184.96.94', 'remote_addr': '10.100.148.128:41393'}, 'exception': 400, 'type': 'HTTPError', 'messag\
e': 'DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root Error: nested DBSError Code:103 Description:DBS DB query error, e.g.\
mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set'}]
5647 │RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set
5648 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(477)__callServer()
5649 │-> self.__parseForException(http_error) |
5650 │(Pdb) |
5651 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(486)__callServer()
5652 │-> self.__parseForException(data) |
5653 │(Pdb) |
5654 │--Return-- |
5655 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(486)__callServer()->None
5656 │-> self.__parseForException(data) |
5657 │(Pdb) |
5658 │RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set
5659 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(647)insertBulkBlock()
5660 │-> result = self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )|
5661 │(Pdb) p result |
5662 │*** NameError: name 'result' is not defined |
5663 │(Pdb) n |
5664 │--Return-- |
5665 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(647)insertBulkBlock()->None
5666 │-> result = self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )|
5667 │(Pdb) p result |
5668 │*** NameError: name 'result' is not defined |
5669 │(Pdb) n |
5670 │RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set
5671 │> /data/WMAgent.venv3/srv/WMCore/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py(94)uploadWorker()
5672 │-> dbsApi.insertBulkBlock(blockDump=block) |
5673 │(Pdb) |
5674 │> /data/WMAgent.venv3/srv/WMCore/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py(96)uploadWorker()
5675 │-> except HTTPError as ex: |
5676 │(Pdb)
[2]
// DBS Error codes provides static representation of DBS errors, they cover 1xx range
const (
GenericErrorCode = iota + 100 // generic DBS error
DatabaseErrorCode // 101 database error
TransactionErrorCode // 102 transaction error
QueryErrorCode // 103 query error
RowsScanErrorCode // 104 row scan error
SessionErrorCode // 105 db session error
CommitErrorCode // 106 db commit error
ParseErrorCode // 107 parser error
LoadErrorCode // 108 loading error, e.g. load template
GetIDErrorCode // 109 get id db error
InsertErrorCode // 110 db insert error
UpdateErrorCode // 111 update error
LastInsertErrorCode // 112 db last insert error
ValidateErrorCode // 113 validation error
PatternErrorCode // 114 pattern error
DecodeErrorCode // 115 decode error
EncodeErrorCode // 116 encode error
ContentTypeErrorCode // 117 content type error
ParametersErrorCode // 118 parameters error
NotImplementedApiCode // 119 not implemented API error
ReaderErrorCode // 120 io reader error
WriterErrorCode // 121 io writer error
UnmarshalErrorCode // 122 json unmarshal error
MarshalErrorCode // 123 marshal error
HttpRequestErrorCode // 124 HTTP request error
MigrationErrorCode // 125 Migration error
RemoveErrorCode // 126 remove error
InvalidRequestErrorCode // 127 invalid request error
BlockAlreadyExists // 128 block xxx already exists in DBS
FileDataTypesDoesNotExist // 129 FileDataTypes does not exist in DBS
FileParentDoesNotExist // 130 FileParent does not exist in DBS
DatasetParentDoesNotExist // 131 DatasetParent does not exist in DBS
ProcessedDatasetDoesNotExist // 132 ProcessedDataset does not exist in DBS
PrimaryDatasetTypeDoesNotExist // 133 PrimaryDatasetType does not exist in DBS
PrimaryDatasetDoesNotExist // 134 PrimaryDataset does not exist in DBS
ProcessingEraDoesNotExist // 135 ProcessingEra does not exist in DBS
AcquisitionEraDoesNotExist // 136 AcquisitionEra does not exist in DBS
DataTierDoesNotExist // 137 DataTier does not exist in DBS
PhysicsGroupDoesNotExist // 138 PhysicsGroup does not exist in DBS
DatasetAccessTypeDoesNotExist // 139 DatasetAccessType does not exist in DBS
DatasetDoesNotExist // 140 Dataset does not exist in DBS
LastAvailableErrorCode // last available DBS error code
)
I changed the DBSWriter instance that the component is accessing from cmsweb.cern.ch
to cmsweb-prod.cern.ch
. As expected, we no longer get the pyCurl error 52
message. The 272 blocks that are already correct in the database were processed without issues, and this is allowing the agent to continue creating and uploading blocks.
Now we are left with the original 4 problematic blocks.
Here to summarize the status and our findings about this issue from the work with T0 Team for the whole last week
The problem is 3 fold:
The agent looses the HTTP header, containing the actual DBSError code, when we switch the frontend to APS
.
Upon a conversation with @vkuznet, we might have a direction. There obviously is a slight difference between how the connection is handled with Apache
and APS
. Things might boil down to the keepAlive
&& keepAliveTimeout
flags.
We are not distinguishing between all possible situations that could have led to a specific error. We treat only one separately - which is DBS ErrorCode 128
. And on top of that we do not even handle/recognize all the possible errors that DBS Server is returning to us.
The above two are concerning mostly the huge pile of blocks which we were accumulating and not recognizing that their records were already in DBS, such that the agent should stop retrying. Once we switched back to the APache frontend all those proceeded, and the sequential steps for the other workflows depending on the data also started.
UNIQUE
table constraint at the file level, so the original block was held back at the agent. The reasons for that are still unknown. One possible place to look is for a concurrency issue on how we feed the 4 different input queues of DBSUploadPoller
. The json dump of the original block though is absolutely correct. The problem is that the json
for the originally uploaded block with the extra files is well gone and we cannot dump it to see what was actually uploaded. As a strategy we decided to split the problem in 5 steps: 2 OPS and 3 DEV
DONE
(reported by @germanfgv in the previous comment)FILES
FILE_PARENTS
FILE_LUMIS
ASSOCIATED_FILES
(the later never imagined even exists)
APS
frontendso:
Muon0
block: /Muon0/Run2024H-v1/RAW#7369ccdf-3d3a-4d32-bad9-b04b02f279d4
with the other three locked blocks. look a good summary done by @LinaresToine here: https://github.com/dmwm/WMCore/issues/11965#issuecomment-2458613221cmswe-testbed.cern.ch
(which is currently) an APS
frontend, I can see that the dbsclient
(which is a dependency for WMCore), does see the HTTP Header, and the DBS errors are well recognizable in the object. See my comment: https://github.com/dmwm/WMCore/issues/11965#issuecomment-2460311782Thank you for summarizing everything that has been going on in here.
For the OPS2 issue above, I find deleting entries from the DBS Server database extremely dangerous. Even though it might require extra work, it would be much safer to actually recreate the lumis (or block) that is failing to get inserted into DBS. Did you and the T0 discuss this possibility? @germanfgv
About the DEV1, unless I am missing some context, I do not think we should replicate every single status code from the DBS Server to the client side. IMO, the client should only deal with the status code that it can actually do something different. If there is no different execution flow, then reporting the error from the server is what we can do (which is already done in the generic exception AFAICT).
@amaltaro we no longer have streamer files for these run/lumis, it's not possible to recreate these blocks.
We could consider making the changes in Rucio, but it would require to remove files from one block and add it to the other. Also, it would require to do the same in the agent's DBSBUFFER
database.
Given the criticality and amount of information in DBS, it would be the last system that I would delete things manually. For dbsbuffer, do I understand it right that we would only need to mark this block and its files as uploaded to DBS? For Rucio, what would have to be done? Remove files/replicas from a DATASET? Would it need creation of a new DATASET + files/replicas?
In Rucio, we would need to remove 4 files from one block and add them to another. In the agent's database, we would need to change the block of the 4 problematic files and mark them as InDBS
. I'm not sure how Rucio would reack to this, but I think it will be ok, as all files belong to the same container
After some discussions during the Tier0 meeting, I decided to have a quick look at the logs to see if we can have a better understanding of this issue.
I don't see some information in this thread, so let me write my observations here: 1) before the problematic blocks have been created in DBS3Upload, the component had a few oracle issues like:
Exception Class: DBSUploadException
Message: Unhandled exception while loading uploadable files for DatasetPath.
(cx_Oracle.DatabaseError) ORA-25401: can not continue fetches
2) after these oracle issues, I noticed many files being reported as duplicated in the logs:
2024-09-19 17:23:01,916:139632874354240:INFO:DBSUploadPoller:Executing loadFiles method...
2024-09-19 17:23:11,876:139632874354240:ERROR:DBSBufferBlock:Duplicate file inserted into DBSBufferBlock: 1077894
Ignoring this file for now!
3) based on Antonio's feedback above, the "impostor block" had the following timeline in the component:
### impostor block 1
2024-09-19 17:51:38,694:139632874354240:INFO:DBSUploadPoller:Queueing block for insertion: /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
2024-09-19 17:52:47,723:139632874354240:INFO:DBSUploadPoller:About to call insert block for: /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
4) while the original block had this timeline (and kept failing since then)
2024-09-19 17:51:38,698:139632874354240:INFO:DBSUploadPoller:Queueing block for insertion: /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4
2024-09-19 17:52:49,777:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 through DBS. Details: DBSError code: 110, message: 997071d9311e283887ce5e57b0b1800467986e1c57f620aff5a39d98b881fb6c unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
5) looking into RucioInjector, these 2 blocks above had the following timeline:
### impostor block 1
2024-09-19 15:30:19,570:139632735942208:INFO:RucioInjectorPoller:Block /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417 inserted into Rucio
2024-09-19 15:30:29,385:139632735942208:INFO:RucioInjectorPoller:Successfully inserted 4 files on block /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
2024-09-19 17:57:20,982:139632735942208:INFO:RucioInjectorPoller:Closing block: /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
### original block 1
2024-09-19 17:56:12,439:139632735942208:INFO:RucioInjectorPoller:Block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 inserted into Rucio
2024-09-19 17:56:41,372:139632735942208:INFO:RucioInjectorPoller:Successfully inserted 5 files on block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4
Having said that, I have the following questions/comments:
1) it looks like we have not closed the original blocks in Rucio. AFAIK it is not a big deal and it has no impact in anything else. It is, nonetheless, different than any other block created by WMAgent.
2) is it possible that the list of files returned from dbsbuffer
was not unique? File id is supposed to be unique (and sequential, AFAICT). How about lfns, do we have the same lfn under different file ids? Otherwise, how would we iterate through the same fileid twice?
Without investigating the codebase too much, it is possible that those duplicate file ids ("Ignoring this file for now!") actually triggered the misbehavior of the component. This duplicate file id is identified here: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSBufferBlock.py#L105 and one of the places it is used (there is another in the same module) is in this block: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L487
@amaltaro , few observations:
DBSUploadPoller.py
it uses thread object which invokes begin/rollbackIs it possible that thread was killed because ORACLE timed out? Or, if connection was lost to ORACLE and error was thrown. How DBSUploadPoller.py
guarantees that transactions will be rolled back if thread is killed? From what I read in a code nothing is protected for such use-case and transaction will not be rolled back if thread is killed. It may explain the weird behavior.
In other words, because of the polling cycle, if thread is killed for whatever reason there is no guarantee that transaction can be rolled back in Python. But polling cycle will start poller again and it may execute the same injection of objects into database which may not be protected (if there is no UNIQUE constrain on a injected object), and it may explain the observed behavior.
Impact of the bug WMAgent
Describe the bug There seems to be an unusual number of blocks that are continuously failing to be inserted into DBS Server, with a variety of errors, as can be seen in [1] and [2].
For [1], that/those blocks actually belong to a worfklow that went all the way to
completed
in the system and then gotrejected
, as can be seen from this ReqMgr2 API.For [2], that block belongs to a workflow that is currently in
running-closed
status. Block failing injection for about 10h.This is based on vocms0255, I haven't yet checked the other agents.
How to reproduce it Not sure
Expected behavior For the rejected workflow (or aborted), we should make DBS3Upload aware that output data is no longer relevant and skip their injection into DBS Server. This might require persisting information in the DBSBuffer tables (like marking the block and relevant files as injected), otherwise the same blocks will come up every time we run a cycle of the DBS3Upload component.
For the
malformed SQL statement
(note a typo mailformed(!)), we probably need to correlate this error with further information from DBS Server. Is it the same error as we have with concurrent HTTP requests? Or what is actually wrong with this. Maybe @todor-ivanov can shed some light on this. Expected behavior of this fix is to be determined.Additional context and error message [1]
[2]