dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Agents continuously failing to insert blocks into DBS #11965

Open amaltaro opened 7 months ago

amaltaro commented 7 months ago

Impact of the bug WMAgent

Describe the bug There seems to be an unusual number of blocks that are continuously failing to be inserted into DBS Server, with a variety of errors, as can be seen in [1] and [2].

For [1], that/those blocks actually belong to a worfklow that went all the way to completed in the system and then got rejected, as can be seen from this ReqMgr2 API.

For [2], that block belongs to a workflow that is currently in running-closed status. Block failing injection for about 10h.

This is based on vocms0255, I haven't yet checked the other agents.

How to reproduce it Not sure

Expected behavior For the rejected workflow (or aborted), we should make DBS3Upload aware that output data is no longer relevant and skip their injection into DBS Server. This might require persisting information in the DBSBuffer tables (like marking the block and relevant files as injected), otherwise the same blocks will come up every time we run a cycle of the DBS3Upload component.

For the malformed SQL statement (note a typo mailformed(!)), we probably need to correlate this error with further information from DBS Server. Is it the same error as we have with concurrent HTTP requests? Or what is actually wrong with this. Maybe @todor-ivanov can shed some light on this. Expected behavior of this fix is to be determined.

Additional context and error message [1]

2024-04-11 15:32:06,562:140685583296256:ERROR:DBSUploadPoller:Error trying to process block /TKCosmics_38T/Run3Winter24Reco-TkAlCosmics0T-AlcaRecoTkAlCosmics0T_cosmics_133X_mcRun3_2024cosmics_realistic_deco_v1-v5/ALCARECO#a5225151-fe56-45b1-b4dc-244b4644c02d through DBS. Details: DBSError code: 0, message: , reason: 

[2]

2024-04-11 14:09:09,438:140685583296256:ERROR:DBSUploadPoller:Error trying to process block /SingleNeutrino_E-10-gun/Run3Winter24Reco-133X_mcRun3_2024_realistic_v10-v2/GEN-SIM-RECO#995c334f-6648-4c55-98a
1-44afbed8a57f through DBS. Details: DBSError code: 131, message: 5d0aae4c60a9089bfd22c0602c1bcecffd88106ed1a4578923297eda9e7da9d2 unable to find dataset_id for /SingleNeutrino_E-10-gun/Run3Winter24Digi-
133X_mcRun3_2024_realistic_v10-v2/GEN-SIM-RAW, error DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set, reason:
 DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set
amaltaro commented 2 months ago

@todor-ivanov as discussed in the meeting today - and right now with Andrea as well - let us put this back to ToDo and come back to this beginning of October (2 weeks more should not hurt us here).

LinaresToine commented 2 months ago

Following discussion in mattermost wm-ops thread with @amaltaro.

Related to failure in inserting data to DBS, the current T0 production agent is struggling with inserting files into the blocks. I see the following error message in the DBS3Upload component log

Error trying to process block /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7 through DBS. Details: DBSError code: 110, message: d93d36f53eaf3097db5c9f50851359041c418a18727e6f363e6c18c37d3f25bb una
ble to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunk
s Message: Error: concurrency error

This is present for the following blocks:

vkuznet commented 2 months ago

I suggest that you review https://github.com/dmwm/WMCore/issues/11106 which describes the actual issue with concurrent data insertion. In short, to make it work we must have all pieces (like dataset configuration, etc.) in place to make concurrent injection. To solve this problem someone must inject first one block with all necessary information, and then can safely use concurrent pattern to inject other blocks.

amaltaro commented 2 months ago

@vkuznet thank you for jumping into this discussion.

I had a feeling that there was another obscure problem with DBS Server, and reviewing the ticket you pointed to (11106) - and according to your sentence above - I understand that, provided that we have at least 1 block injected into DBS for a given dataset, the "concurrency error" should no longer happen, given that all the foundation information is already in the database. Correct?

I picked one of the blocks provided by Antonio and queried DBS Server for its blocks: https://cmsweb.cern.ch/dbs/prod/global/DBSReader/blocks?dataset=/AlCaP0/Run2024H-v1/RAW

as you can see, this dataset already has a bunch of blocks in the database. So, how come we are having a "concurrency error" here?

vkuznet commented 2 months ago

If you'll inspect the code [1], in order to insert DBS block concurrently we need to have in place:

So, if all of these information is present and it is consistent across all blocks in DBS then answer is yes the concurrency error (based on database content) should not arise. In other words DBS server first acquire or insert this info into DBS tables and if two or more HTTP calls arrives at the same time it can cause database error which lead to concurrency error form DBS server. Is it the case of the discussed blocks I don't know. But it is possible to not have all the information present in DB across all blocks if any of the above have differ among them.

You may look at example of bulkblocks JSON [2] to see actually how this information is structured and provided to DBS. In particular, the information in dataset_conf_list and file_conf_list is used to look-up aforementioned info, along with primds, processing_era, etc. So, if you inject multiple JSON they need to have identical info for those attributes, otherwise you may potentially get into racing conditions described in https://github.com/dmwm/WMCore/issues/11106

[1] https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks2.go#L478 [2] https://github.com/dmwm/dbs2go/blob/master/test/bulkblocks.json

amaltaro commented 2 months ago

Valentin, unless there is a bug in the (T0)WMAgent, all the blocks for the same dataset should carry exactly the same metadata. That means, same acquisition era, primary dataset, etc etc etc.

Having said that, if a block exists in DBS Server, we can conclude that all of its metadata is already available as well. IF that metadata is already available and we are trying to inject more blocks for the same dataset, hence the same meta-data, there should be NO concurrency error.

Based on your explanation and on the data shared by Antonio, I fail to see how we would hit a "concurrency error". That means there is more to what we have discussed/understood so far; or the error message is misleading...

In any case, I would suggest to have @todor-ivanov following this up next week, comparing things with the DBS Server logs and against the source code.

vkuznet commented 2 months ago

I further looked into the dbs code and I think I identified the issue. According to the dbs code

Then, I looked at one of the dbs logs and found

[2024-09-24 00:45:17.228980109 +0000 UTC m=+2471302.202098481] fail to insert files chunks, trec &{IsFileValid:1 DatasetID:15071289 BlockID:37951592 CreationDate:1727138717 CreateBy:/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=cmst0/CN=658085/CN=Robot: CMS Tier0 FilesMap:{mu:{state:0 sema:0} read:{_:[] _:{} v:<nil>} dirty:map[] misses:0} NErrors:2}

So, indeed input file record DOES NOT contain required file type attribute, see File structure over here https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks.go#L65. The "file_type" must be present in provided JSON, otherwise it will be assigned to default value 0 which is what file injection tries to get from database and it should be non-zero value.

To summarize, I suggest to check JSON records T0 provides and ensure it provides "file_type" along other file attributes (all of them are defiend in this struct: https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks.go#L65). Without it DBS code correctly fails, but probably it would be useful to adjust error message to properly report the error.

vkuznet commented 2 months ago

For the record, here is how DBS error look in a log:

[2024-09-24 00:45:17.228980109 +0000 UTC m=+2471302.202098481] fail to insert files chunks, trec &{IsFileValid:1 DatasetID:15071289 BlockID:37951592 CreationDate:1727138717 CreateBy:/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=cmst0/CN=658085/CN=Robot: CMS Tier0 FilesMap:{mu:{state:0 sema:0} read:{_:[] _:{} v:<nil>} dirty:map[] misses:0} NErrors:2}
[2024-09-24 00:45:17.229561212 +0000 UTC m=+2471302.202679583] 5ecdc2bdcd03492fd64efc269de332cdcf1c8a53c3e3cc07168b0c741f0270ba unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
[2024-09-24 00:45:17.232415539 +0000 UTC m=+2471302.205533911] DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:5ecdc2bdcd03492fd64efc269de332cdcf1c8a53c3e3cc07168b0c741f0270ba unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error Error: nested DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error Stacktrace:
goroutine 300475111 [running]:
github.com/dmwm/dbs2go/dbs.Error({0xb054e0?, 0xc0009f2410?}, 0x6e, {0xc0004f60f0, 0xe6}, {0xa3b23e, 0x2b})
        /go/src/github.com/vkuznet/dbs2go/dbs/errors.go:185 +0x99
github.com/dmwm/dbs2go/dbs.(*API).InsertBulkBlocksConcurrently(0xc00036c000)
        /go/src/github.com/vkuznet/dbs2go/dbs/bulkblocks2.go:743 +0x2546
github.com/dmwm/dbs2go/web.DBSPostHandler({0xb08290, 0xc000012cd8}, 0xc000616700, {0xa1d753, 0xa})
        /go/src/github.com/vkuznet/dbs2go/web/handlers.go:544 +0x1374
github.com/dmwm/dbs2go/web.BulkBlocksHandler({0xb08290?, 0xc000012cd8?}, 0xc000a9f460?)
        /go/src/github.com/vkuznet/dbs2go/web/handlers.go:960 +0x3b
net/http.HandlerFunc.ServeHTTP(0xc00055f1a0?, {0xb08290?, 0xc000012cd8?}, 0x95d5a0?)
        /usr/local/go/src/net/http/server.go:2136 +0x29
github.com/dmwm/dbs2go/web.limitMiddleware.func1({0xb08290?, 0xc000012cd8?}, 0xc00055f1a0?)
        /go/src/github.com/vkuznet/dbs2go/web/middlewares.go:110 +0x32
net/http.HandlerFunc.ServeHTTP(0x7f8c001964c0?, {0xb08290?, 0xc000012cd8?}, 0xc0003

So, you have all pointers to look which lines of code fails by inspecting its stack, and that exactly what I did.

amaltaro commented 2 months ago

As far as I can tell, it should always be set like:

      "file_type": "EDM",

@LinaresToine can you please change the component configuration and provide one of the block names that is failing to be inserted, in the following line:

config.DBS3Upload.dumpBlockJsonFor = ""

then restart DBS3Upload and you should soon get a JSON dump of the content that the component is POSTing to the DBS Server. Output file should be under the component directory (e.g. install/DBS3Upload/).

LinaresToine commented 2 months ago

Ok, I changed the config as suggested. Waiting on the loadFiles method to complete the cycle. Ill follow up

LinaresToine commented 2 months ago

I have placed the output json file in /eos/home-c/cmst0/public/dbsError/dbsuploader_block.json.

Another error is showing up in the DBS3Upload component for all 4 pending blocks:

Hit a general exception while inserting block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd. Error: (52, 'Empty reply from server')
Traceback (most recent call last):
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/WMComponent/DBS3Buffer/DBSUploadPoller.py", line 94, in uploadWorker
    dbsApi.insertBulkBlock(blockDump=block)
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 647, in insertBulkBlock
    result =  self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 474, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RestApi.py", line 42, in post
    return http_request(self._curl)
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 56, in __call__
    curl_object.perform()
pycurl.error: (52, 'Empty reply from server')
germanfgv commented 1 month ago

An update from T0: Here is a JSON dump for a succesfully uploaded T0 DBS block:

/eos/home-c/cmst0/public/dbsError/dbsuploader_successful_block.json

Now we have a total of 276 blocks that we are unable to upload. We the same error message for all of them. A list of these blocks can be found here:

/eos/home-c/cmst0/public/dbsError/failingBlocks.txt

Because of these, we have 121384 files in T0 that we have been unable to register in DBS. @todor-ivanov is trying to find a way for us to upload this information.

todor-ivanov commented 1 month ago

Here is the follow up on what is the status of those blocks according to DBS. I had to create a script to go and query directly the DBS database lfn by lfn for all those blocks and here is the accumulated result: blockDBSRecords.json: /eos/home-c/cmst0/public/dbsError/blockDBSRecords.json

So from what I can see from those results we can identify at least 3 different use cases:

I am going to filter out those for which we know are there. On top of that I consider checking their Rucio status as well. FYI @germanfgv

p.s. Here: DBSBlocksCheck.py is the script I used for accumulating those results

p.s. Here: And here: blockDBSRecords.json is an updated version of the DBS records with updated Rucio information per block as well

todor-ivanov commented 1 month ago

And continuing to reduce the results to something more readable here [1] is the final list of the block and file status at DBS for all of them.

As one can see:

FYI: @germanfgv @LinaresToine

[1]

    blockName: /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4: 
        blockDBSStatus: ['MISSING']
        filesDBSStatus: ['BLOCKMISMATCH']
    blockName: /Muon0/Run2024H-v1/RAW#7369ccdf-3d3a-4d32-bad9-b04b02f279d4: 
        blockDBSStatus: ['MISSING']
        filesDBSStatus: ['MISSING', 'BLOCKMISMATCH']
    blockName: /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd: 
        blockDBSStatus: ['MISSING']
        filesDBSStatus: ['BLOCKMISMATCH']
    blockName: /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7: 
        blockDBSStatus: ['MISSING']
        filesDBSStatus: ['MISSING', 'BLOCKMISMATCH']
    blockName: /Commissioning/Run2024I-v1/RAW#4113bc20-c92b-43a3-a767-06bccfe4af56: OK
    blockName: /ParkingSingleMuon6/Run2024I-v1/RAW#75ce0f9d-2153-4520-b79f-bb0df5f19227: OK
    blockName: /ZeroBias/Run2024I-v1/RAW#50e30425-0a03-46c5-8da0-26719c266dbc: OK
    blockName: /ParkingSingleMuon1/Run2024I-v1/RAW#fae36d0d-cc36-4c5e-bfde-009aa38f9b7c: OK
    blockName: /ParkingSingleMuon2/Run2024I-v1/RAW#3c125d76-99c4-4f65-8141-0ae9abcd0e1a: OK
    blockName: /ParkingSingleMuon7/Run2024I-v1/RAW#0a25242b-c6fb-4eca-8382-826f4e878021: OK
    blockName: /ParkingSingleMuon5/Run2024I-v1/RAW#b01693d9-d3d9-4ab2-8760-0b019652f89e: OK
    blockName: /ParkingSingleMuon8/Run2024I-v1/RAW#a57df37e-4d9d-465e-93b2-54d40f892429: OK
    blockName: /ParkingSingleMuon10/Run2024I-v1/RAW#627f4dd1-4c31-4f4b-bb85-48d19996ba4f: OK
    blockName: /ParkingSingleMuon9/Run2024I-v1/RAW#1b4bc705-f6de-4f55-9c8b-7cd490457341: OK
    blockName: /Tau/Run2024I-v1/RAW#2da0c3fb-4b3c-47a2-ac49-99cca62c226d: OK
    blockName: /BTagMu/Run2024I-v1/RAW#280bd893-a382-496c-8e6f-366309493acc: OK
    blockName: /ParkingDoubleMuonLowMass1/Run2024I-v1/RAW#b4492da6-fa05-4484-a034-aa2a87354735: OK
    blockName: /AlCaLowPtJet/Run2024I-v1/RAW#0002be22-10d9-4200-9845-bf112ec9291a: OK
    blockName: /Muon1/Run2024I-v1/RAW#499938c6-8357-4095-99db-91c90e600f0e: OK
    blockName: /ParkingDoubleMuonLowMass4/Run2024I-PromptReco-v1/AOD#09d3e003-1ca7-459f-8089-1f1d95f5ba20: OK
    blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/DQMIO#777a6f7a-058d-46a4-bfb9-d905b141fbd2: OK
    blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/AOD#6fbca7c8-560f-4805-af21-55d424e9877a: OK
    blockName: /Muon0/Run2024I-MuAlCalIsolatedMu-PromptReco-v1/ALCARECO#06571035-b278-4998-a9d3-2b523bb4fd0e: OK
    blockName: /Muon1/Run2024I-HcalCalHO-PromptReco-v1/ALCARECO#65cfc43b-aca0-4600-8bb3-db4261856f3b: OK
    blockName: /Tau/Run2024I-LogError-PromptReco-v1/RAW-RECO#645686dd-1135-4866-9e53-6438aa17600d: OK
    blockName: /Muon1/Run2024I-PromptReco-v1/NANOAOD#d26b31f5-0371-4fe2-9420-e87a87925fdd: OK
    blockName: /ParkingSingleMuon8/Run2024I-PromptReco-v1/MINIAOD#86c82707-6992-4126-b86f-182c5f5aa7fc: OK
    blockName: /Tau/Run2024I-PromptReco-v1/AOD#6627bc34-c746-49a9-ab02-550710731e1b: OK
    blockName: /Muon0/Run2024I-PromptReco-v1/MINIAOD#ebab670f-0588-422e-8206-c406f948bb06: OK
    blockName: /Muon0/Run2024I-PromptReco-v1/DQMIO#35e3beac-bd5d-4ba4-82c0-a5372e89e5a6: OK
    blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/DQMIO#2e38060a-bf22-4135-897d-f7d93684dede: OK
    blockName: /DisplacedJet/Run2024I-EXOLLPJetHCAL-PromptReco-v1/AOD#ab2bd1a4-6f42-4d9a-853f-1a7a5aa5f2f4: OK
    blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/MINIAOD#0821e60b-12b6-4993-8a67-0952379c34bb: OK
    blockName: /Muon1/Run2024I-EXOCSCCluster-PromptReco-v1/USER#5197c2ba-2f13-48e9-bddd-9c1fd071cd33: OK
    blockName: /ParkingVBF6/Run2024I-PromptReco-v1/NANOAOD#a36dad4c-dac8-4a43-bde9-5ac38a0f8b7d: OK
    blockName: /EphemeralZeroBias1/Run2024I-PromptReco-v1/MINIAOD#0c7b63de-5a48-4d98-8e9b-9c52c714703a: OK
    blockName: /JetMET1/Run2024I-PromptReco-v1/DQMIO#6d3cc1c5-48c8-4f9e-9f8b-1cb5b481d550: OK
    blockName: /Tau/Run2024I-PromptReco-v1/NANOAOD#473eddda-612e-4306-91bf-9dfcb3b5d108: OK
    blockName: /ParkingVBF6/Run2024I-PromptReco-v1/MINIAOD#38f7630c-e015-4a97-a331-e33b0cfa3604: OK
    blockName: /ParkingSingleMuon6/Run2024I-PromptReco-v1/AOD#bd0eeb38-b1d0-4157-ade6-fb3b65f57995: OK
    blockName: /ParkingVBF1/Run2024I-PromptReco-v1/AOD#1298b211-43f2-49c6-8788-2bde6e2a9e62: OK
    blockName: /ParkingVBF0/Run2024I-PromptReco-v1/MINIAOD#63e59425-3ff1-406c-ae18-48bf9f239354: OK
    blockName: /ParkingSingleMuon8/Run2024I-PromptReco-v1/AOD#85d58a9d-29b0-4f98-99bc-9c201ed2c6a2: OK
    blockName: /Tau/Run2024I-EXODisappTrk-PromptReco-v1/USER#1a96c708-64a9-4f62-819f-a19633154b16: OK
    blockName: /SpecialZeroBias5/Run2024I-PromptReco-v1/AOD#59e655a4-2897-47d0-ba11-287332c4e6b5: OK
    blockName: /ParkingVBF1/Run2024I-PromptReco-v1/NANOAOD#aa47f11c-7080-492b-ab91-ad19e6299fff: OK
    blockName: /ParkingVBF3/Run2024I-PromptReco-v1/NANOAOD#4330d839-9985-4b36-9d5e-b5aa5c19175f: OK
    blockName: /ParkingHH/Run2024I-PromptReco-v1/MINIAOD#e8d162fa-f391-439c-a7f1-8a8d39dda120: OK
    blockName: /Tau/Run2024I-PromptReco-v1/DQMIO#1a0ac20a-1d60-4d89-8133-e8559f1e4c13: OK
    blockName: /ParkingSingleMuon0/Run2024I-PromptReco-v1/MINIAOD#d8995d51-e005-4757-8439-850c005cbd57: OK
    blockName: /ParkingVBF5/Run2024I-PromptReco-v1/MINIAOD#5bff218a-4895-4f13-8148-c9e0bcf820b7: OK
    blockName: /Muon0/Run2024I-PromptReco-v1/AOD#306f5950-5eec-43d3-96f2-8dfbe22d322c: OK
    blockName: /EGamma0/Run2024I-PromptReco-v1/AOD#e9814b10-2545-4a83-8a3d-2501f5679ecd: OK
    blockName: /JetMET1/Run2024I-PromptReco-v1/NANOAOD#039f5f67-3f70-4797-9b2a-c6d698e52efd: OK
    blockName: /ParkingVBF1/Run2024I-PromptReco-v1/DQMIO#b1f45558-5e9a-493c-afed-7e133bb4a7e7: OK
    blockName: /DisplacedJet/Run2024I-PromptReco-v1/AOD#0edffee0-8286-4658-943c-8efc45f23ea4: OK
    blockName: /Muon1/Run2024I-PromptReco-v1/DQMIO#edae199f-fb22-48d3-9a8a-cdb15703bcbe: OK
    blockName: /DisplacedJet/Run2024I-EXODelayedJet-PromptReco-v1/AOD#b46c0dcb-c26a-47c2-a4b0-fcac9b9d63be: OK
    blockName: /Muon0/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#6b39e513-27ac-4e54-ad1e-a343b9d064fc: OK
    blockName: /JetMET1/Run2024I-PromptReco-v1/AOD#fa6562fe-0d1d-4d06-9bf0-a135edbcf172: OK
    blockName: /ParkingVBF0/Run2024I-PromptReco-v1/DQMIO#24686495-80bc-44de-a3b2-f39cfa971760: OK
    blockName: /NoBPTX/Run2024I-PromptReco-v1/AOD#99fd2849-5b43-4927-9596-6e6a33683d9c: OK
    blockName: /HLTPhysics/Run2024I-LogError-PromptReco-v1/RAW-RECO#8abbaf67-41c7-452f-816a-f978dd14cc1b: OK
    blockName: /EGamma0/Run2024I-LogError-PromptReco-v1/RAW-RECO#ab07e175-a29f-4203-a3f3-dceb2938ae33: OK
    blockName: /JetMET1/Run2024I-EXODisappTrk-PromptReco-v1/USER#98870a35-c0d8-4ece-9731-1ac081143000: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#3b08c77d-8e97-4aca-be54-f95b7ab76465: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#2b3eefa5-923b-4b42-9c5c-cf162453d59b: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#1d77a290-571c-442d-be95-531e4168e94d: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#23d1c315-25d0-47a8-813e-caa7a5f2a0f1: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#80570ad8-4b6a-4e00-bcb5-63ff743504d5: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#5abdb7b2-4a4a-4a10-a4b3-9cfae99bdf83: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#adc72e9a-7410-493a-a327-1611b18a4106: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#e0c75028-64eb-480f-abb6-910505a92973: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#c1bc74d9-2c3b-415d-928f-7ec8395868ad: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#f1e8e065-2a65-4bd7-9279-d88c423c0ea0: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#8147c17b-4550-4a14-9747-ca696aa03408: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#4645a8b5-b1ef-4008-b1f5-dff3fadb1855: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a8e53c28-c9ad-40fa-88bf-b7c2f3e61a64: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#19e36aa2-2ec8-4974-891f-112279ec9393: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#ba34c69d-db70-455e-81cc-13a161727e80: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#b66d6bae-2647-44b9-8bad-320da54d0a29: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#33b16287-bc8d-421b-bd38-2059ad19dd87: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a1f20e02-b988-4e3a-bd6f-68610bde0b97: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#d3c7711f-b7ba-4e4d-9db8-999cd6383551: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a77f0bd9-bd98-418e-b39c-9bf859203fad: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#aff24402-3ead-4ab6-9d71-02b13721b7cf: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#aea1ab62-ffa2-447f-bede-dbd01a05708a: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#ebb35256-bdc6-40a7-ae6c-9de27a2094bf: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#36740b3b-31a6-4be6-a4e4-f76f5e1200ab: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#a95882ee-05b6-482a-bbf8-6f7ff8ab4354: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#0358032c-2997-440c-a658-461e011e87a0: OK
    blockName: /Cosmics/Run2024I-MuAlGlobalCosmics-PromptReco-v1/ALCARECO#f7f08dfb-6c23-441c-9137-09abad0a7d39: OK
    blockName: /ParkingDoubleMuonLowMass2/Run2024I-PromptReco-v1/DQMIO#e9d71494-b460-4680-a3b9-7a1c62fc4d01: OK
    blockName: /ParkingHH/Run2024I-PromptReco-v1/AOD#ba026985-4cfd-4a06-ba1b-bfb5af6cbb64: OK
    blockName: /MinimumBias/Run2024I-PromptReco-v1/NANOAOD#dfbf01f5-c43c-463f-b562-aee5a91da41e: OK
    blockName: /ParkingSingleMuon2/Run2024I-PromptReco-v1/AOD#f2995e54-af6b-456c-8b40-abb844b299a2: OK
    blockName: /EGamma0/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#0ad19bb8-d4fa-4565-9019-91ef6e7207ac: OK
    blockName: /EGamma1/Run2024I-EXODisappTrk-PromptReco-v1/USER#cf89035f-8d3d-4a95-9dd4-ae73b92cb865: OK
    blockName: /MinimumBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#d4fdf46b-db01-4df9-9eac-e23672e14f84: OK
    blockName: /Muon0/Run2024I-EXODisappTrk-PromptReco-v1/USER#ae7335f9-c7ae-42bd-8304-0805347446dd: OK
    blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/NANOAOD#0f9d8bfa-a7f7-44f9-8bec-e509f8334490: OK
    blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/DQMIO#a004890b-d6cf-4ff0-b715-f1ba374e3d97: OK
    blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/AOD#64b94383-cbaf-48a5-b194-3d15baa01adc: OK
    blockName: /Tau/Run2024I-PromptReco-v1/MINIAOD#23bc1b6d-3f75-4007-8618-52755f3fb1f3: OK
    blockName: /ParkingDoubleMuonLowMass3/Run2024I-PromptReco-v1/NANOAOD#3935aabb-be36-4e9d-a49f-afc523994fd5: OK
    blockName: /MinimumBias/Run2024I-SiStripCalZeroBias-PromptReco-v1/ALCARECO#d621c61d-52ea-4d11-b092-4068bfd61ddf: OK
    blockName: /EphemeralZeroBias7/Run2024I-PromptReco-v1/MINIAOD#f770ca99-fca5-4054-a377-9365c016069b: OK
    blockName: /SpecialZeroBias1/Run2024I-PromptReco-v1/MINIAOD#b8cc0194-13dd-4091-96fa-28f5be5c2134: OK
    blockName: /ParkingDoubleMuonLowMass4/Run2024I-TkAlJpsiMuMu-PromptReco-v1/ALCARECO#950e3d2c-5375-47c5-8f79-03df611b9422: OK
    blockName: /Commissioning/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#0a2b5651-fe2e-436e-af4d-488cb00acf68: OK
    blockName: /ScoutingPFMonitor/Run2024I-PromptReco-v1/NANOAOD#c2e3fd58-1bf5-4769-9230-6c6ac11bf75f: OK
    blockName: /SpecialZeroBias5/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#3ecd813c-ad88-4ced-acba-d78a9ebc9963: OK
    blockName: /SpecialZeroBias5/Run2024I-SiStripCalZeroBias-PromptReco-v1/ALCARECO#31c2c844-9601-4881-ba21-e999b89d7900: OK
    blockName: /JetMET1/Run2024I-HcalCalIsoTrkProducerFilter-PromptReco-v1/ALCARECO#0f1c47f1-3755-4a00-8d36-2cd47e42605c: OK
    blockName: /ParkingHH/Run2024I-PromptReco-v1/DQMIO#b3983892-1ab2-4d9b-a5da-c7e238846e1f: OK
    blockName: /SpecialZeroBias5/Run2024I-LogErrorMonitor-PromptReco-v1/USER#6ea22759-354e-49ba-a72b-2d29034979e2: OK
    blockName: /ParkingVBF6/Run2024I-PromptReco-v1/DQMIO#332ab512-bdef-44e0-a091-9615bdd417c6: OK
    blockName: /Muon0/Run2024I-TkAlZMuMu-PromptReco-v1/ALCARECO#6d4fa60b-d7c3-4280-ab08-c48d7cbf258d: OK
    blockName: /EphemeralZeroBias3/Run2024I-PromptReco-v1/NANOAOD#fde6778b-8361-427a-876f-e16b2e65978f: OK
    blockName: /TestEnablesEcalHcal/Run2024I-Express-v1/RAW#6a31d991-d964-4fca-9113-59d1c40d5759: OK
    blockName: /StreamExpressCosmics/Run2024I-SiPixelCalZeroBias-Express-v1/ALCARECO#13f46438-f23d-4121-85fb-896d224db127: OK
    blockName: /StreamExpressCosmics/Run2024I-SiStripCalCosmics-Express-v1/ALCARECO#9d90aa84-6f90-4ee3-8e33-a2718e9e59b2: OK
    blockName: /StreamALCAPPSExpress/Run2024I-PromptCalibProdPPSAlignment-Express-v1/ALCAPROMPT#939fa7d4-6ffa-4788-9b12-83436c0413b5: OK
    blockName: /StreamExpress/Run2024I-TkAlZMuMu-Express-v1/ALCARECO#6097a868-c5bd-43ae-a804-1e881fcf5bc4: OK
    blockName: /StreamExpress/Run2024I-SiPixelCalSingleMuonTight-Express-v1/ALCARECO#1b495798-f27a-49e4-8a21-87d8d2a236f6: OK
    blockName: /StreamExpress/Run2024I-PromptCalibProdSiPixelAliHGComb-Express-v1/ALCAPROMPT#aafb7697-c790-4d5e-9dac-4cf5fae1c4ce: OK
    blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/AOD#f35b9721-1e35-434b-9e06-28b6c88f64fe: OK
    blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/AOD#90186a44-9236-4665-b48d-a8dd37ef0ff1: OK
    blockName: /Cosmics/Run2024I-CosmicTP-PromptReco-v1/RAW-RECO#7c8c249d-9515-4d08-8217-418a269b1a2e: OK
    blockName: /ParkingSingleMuon1/Run2024I-PromptReco-v1/MINIAOD#51342b65-75e2-42eb-b868-4df8ee7809b8: OK
    blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/AOD#e8fd5b7b-24c4-49a1-8c26-7d7c2d37661a: OK
    blockName: /ParkingVBF5/Run2024I-PromptReco-v1/AOD#db278bd7-459c-4427-9c2f-b214640caaeb: OK
    blockName: /SpecialZeroBias1/Run2024I-PromptReco-v1/AOD#786cf920-b104-4e8f-bebb-30a30d090357: OK
    blockName: /EGamma0/Run2024I-EcalUncalWElectron-PromptReco-v1/ALCARECO#fd409f01-fba4-4d87-a1c5-04ccde5ee8ad: OK
    blockName: /Muon0/Run2024I-SiPixelCalSingleMuonLoose-PromptReco-v1/ALCARECO#5b6152f5-616f-4f55-ab65-eb8b1de0798b: OK
    blockName: /EGamma0/Run2024I-EGMJME-PromptReco-v1/RAW-RECO#55a6487b-d4f9-4d01-82c7-0f2ee35872d1: OK
    blockName: /EGamma1/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#67ac37a8-4d01-4a8b-a30f-4a714cfb2a0a: OK
    blockName: /HLTPhysics/Run2024I-LogErrorMonitor-PromptReco-v1/USER#1b6693e3-d43b-47e9-ae23-fd327e5af74e: OK
    blockName: /MuonShower/Run2024I-EXOCSCCluster-PromptReco-v1/USER#6d2d1137-e1d8-4ae0-bd81-065f8a050490: OK
    blockName: /ParkingVBF1/Run2024I-PromptReco-v1/MINIAOD#12935b72-988e-44ca-9c7c-9b9d8063d8b3: OK
    blockName: /Cosmics/Run2024I-LogError-PromptReco-v1/RAW-RECO#9b7ece4a-0f62-4b45-942a-e8366b905412: OK
    blockName: /EGamma1/Run2024I-WElectron-PromptReco-v1/USER#e8b3efe3-339d-4b5c-82df-bc78defb09ea: OK
    blockName: /NoBPTX/Run2024I-TkAlCosmicsInCollisions-PromptReco-v1/ALCARECO#c64e5941-7da4-47a8-ab32-284a5e059dca: OK
    blockName: /Commissioning/Run2024I-LogError-PromptReco-v1/RAW-RECO#af0433ba-b843-4b44-bfa3-70b5d7475863: OK
    blockName: /MinimumBias/Run2024I-PromptReco-v1/AOD#e45c0cff-39a4-440f-819a-2e522045618b: OK
    blockName: /ParkingSingleMuon7/Run2024I-PromptReco-v1/NANOAOD#9162a0ab-23e1-4fd1-870e-f55da0661a44: OK
    blockName: /Muon1/Run2024I-TkAlZMuMu-PromptReco-v1/ALCARECO#582d0da0-d896-46f3-b4de-e7001a1b4dea: OK
    blockName: /EphemeralZeroBias3/Run2024I-PromptReco-v1/MINIAOD#aeaed114-ebc8-4ef4-a625-5af2c3559120: OK
    blockName: /Commissioning/Run2024I-EcalActivity-PromptReco-v1/RAW-RECO#e65b7bb5-0732-432a-a181-4d9153161caa: OK
    blockName: /ParkingVBF3/Run2024I-PromptReco-v1/DQMIO#fecdfbd5-6a50-4100-92f7-90b05c452570: OK
    blockName: /ParkingVBF2/Run2024I-PromptReco-v1/DQMIO#43430e62-1bd6-4161-aa8d-3cf9b3da3e55: OK
    blockName: /ParkingDoubleMuonLowMass4/Run2024I-PromptReco-v1/NANOAOD#ceeadfe9-03f7-471f-93fb-a7d4ae3ab806: OK
    blockName: /EGamma1/Run2024I-LogErrorMonitor-PromptReco-v1/USER#2bd7cb69-577f-47a0-adaa-115b9df2e1b2: OK
    blockName: /Commissioning/Run2024I-PromptReco-v1/NANOAOD#2936bb45-005c-4f28-9ee0-c24b8e8b647e: OK
    blockName: /StreamExpress/Run2024I-TkAlMinBias-Express-v1/ALCARECO#c07f8345-698b-4a8d-87ac-39ef617874e0: OK
    blockName: /StreamCalibration/Run2024I-EcalTestPulsesRaw-Express-v1/ALCARECO#b05bf432-e2d5-4e77-8bc3-d51be7fadb5c: OK
    blockName: /StreamExpress/Run2024I-PromptCalibProdSiStripGains-Express-v1/ALCAPROMPT#82ba3ad1-7f7a-4a6b-ba44-e00632e53de5: OK
    blockName: /ExpressPhysics/Run2024I-Express-v1/FEVT#12d498f7-0eb0-4c0b-a124-f971dfab8ec8: OK
    blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/AOD#82f71a1c-0f5c-4a24-88e9-24d02619104d: OK
    blockName: /Muon0/Run2024I-ZMu-PromptReco-v1/RAW-RECO#a0c2d025-fe28-40fc-b695-64e3465954d8: OK
    blockName: /ParkingSingleMuon1/Run2024I-PromptReco-v1/AOD#2b73f5f0-a63f-402e-94bc-6231bc31d130: OK
    blockName: /ParkingSingleMuon1/Run2024I-PromptReco-v1/AOD#45da3ad2-609d-4ed7-a77c-4a212635ea47: OK
    blockName: /EGamma0/Run2024I-PromptReco-v1/DQMIO#5ba6d8d0-5af2-4e46-9028-f8a83a38bd22: OK
    blockName: /ParkingDoubleMuonLowMass2/Run2024I-PromptReco-v1/NANOAOD#179594fb-d82a-4194-b8c7-ef8636ccade9: OK
    blockName: /ZeroBias/Run2024I-HcalCalIsolatedBunchSelector-PromptReco-v1/ALCARECO#623d20fe-86d2-4650-9c32-76fa0c792d6b: OK
    blockName: /HcalNZS/Run2024I-LogError-PromptReco-v1/RAW-RECO#ae1000be-4a3e-4b4d-b9dc-362c2028b5a9: OK
    blockName: /EGamma0/Run2024I-EcalESAlign-PromptReco-v1/ALCARECO#982d6c67-32ba-4a10-9239-4f460bf1c002: OK
    blockName: /ScoutingPFMonitor/Run2024I-PromptReco-v1/MINIAOD#37708d28-236f-4554-91d3-e3617b2c2a22: OK
    blockName: /ParkingSingleMuon2/Run2024I-PromptReco-v1/MINIAOD#ee62eeb6-9a38-4a12-94b4-6e2e03c5bf6c: OK
    blockName: /MuonShower/Run2024I-PromptReco-v1/MINIAOD#ab4cb22d-8acc-4661-b955-84786cd695db: OK
    blockName: /ParkingSingleMuon0/Run2024I-PromptReco-v1/NANOAOD#b2b7410c-cae4-48d2-aa08-1a4473ca9fcb: OK
    blockName: /EGamma1/Run2024I-EcalESAlign-PromptReco-v1/ALCARECO#7fb030a6-ce2b-40ba-9cfa-633e914c6ed4: OK
    blockName: /ZeroBias/Run2024I-PromptReco-v1/DQMIO#59bf6b32-52ee-4b87-a5f9-0eaf70f4eb00: OK
    blockName: /ParkingVBF4/Run2024I-PromptReco-v1/NANOAOD#023f6a88-3b2b-4826-ab9b-76a146d5c6a0: OK
    blockName: /SpecialZeroBias1/Run2024I-LogError-PromptReco-v1/RAW-RECO#9d2492d8-6426-4ece-bdf3-2c91113f4286: OK
    blockName: /MinimumBias/Run2024I-PromptReco-v1/MINIAOD#9029265d-b4db-458c-afc1-405d680f07da: OK
    blockName: /SpecialZeroBias0/Run2024I-PromptReco-v1/AOD#9c8e602e-c2ba-49c4-9b96-4b2ff143d3a4: OK
    blockName: /ParkingDoubleMuonLowMass1/Run2024I-TkAlUpsilonMuMu-PromptReco-v1/ALCARECO#6cdb8ade-6f9b-4b24-ab89-027326e095be: OK
    blockName: /SpecialZeroBias5/Run2024I-LogError-PromptReco-v1/RAW-RECO#a1de2483-ce3e-425d-85d6-5fea33a6ea67: OK
    blockName: /StreamExpressCosmics/Run2024I-Express-v1/DQMIO#699d704b-19aa-480c-8a8d-0fbebb0b0cd9: OK
    blockName: /StreamExpressCosmics/Run2024I-PromptCalibProdSiStripLA-Express-v1/ALCAPROMPT#6420dcde-2fa4-45bd-9c32-d10682923317: OK
    blockName: /StreamExpressCosmics/Run2024I-PromptCalibProdSiStrip-Express-v1/ALCAPROMPT#2e69c51a-1dca-4088-8aac-3743f810ce56: OK
    blockName: /StreamALCAPPSExpress/Run2024I-PPSCalMaxTracks-Express-v1/ALCARECO#652af1be-cb9a-4a1c-bbf4-c88c1686d301: OK
    blockName: /StreamExpress/Run2024I-PromptCalibProd-Express-v1/ALCAPROMPT#0e377dc2-065e-4e15-8566-bd910470baad: OK
    blockName: /StreamExpress/Run2024I-SiPixelCalSingleMuon-Express-v1/ALCARECO#3a744512-b3a2-447d-85f5-d0ec6254af59: OK
    blockName: /ZeroBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#35f2cee8-79c7-4cab-9d86-dc50c003d893: OK
    blockName: /ZeroBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#56647210-b149-4fdc-800f-a3e9523b3ea3: OK
    blockName: /HcalNZS/Run2024I-PromptReco-v1/DQMIO#045c4ec2-d9a4-44ce-9fd5-717415778bf5: OK
    blockName: /SpecialZeroBias5/Run2024I-PromptReco-v1/NANOAOD#d533eae7-c987-4e63-9e0e-0edb3e0bd246: OK
    blockName: /ParkingSingleMuon0/Run2024I-PromptReco-v1/AOD#a154328f-5edf-46bb-a395-d3d45a8b1ca6: OK
    blockName: /EGamma1/Run2024I-ZElectron-PromptReco-v1/RAW-RECO#5f92c14b-c6c0-4b8e-8913-71a11b54f598: OK
    blockName: /EGamma1/Run2024I-EXOMONOPOLE-PromptReco-v1/USER#6dcd1342-723d-4844-87b9-8248bf5db833: OK
    blockName: /StreamExpress/Run2024I-PromptCalibProdSiPixel-Express-v1/ALCAPROMPT#c9543d48-4c8f-4936-bd43-3a63dd1c174f: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#ef15476a-bbad-4545-9f91-7e77de5d6034: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#0758fd50-931f-488d-8a1e-663e1c6174e4: OK
    blockName: /SpecialZeroBias1/Run2024I-PromptReco-v1/DQMIO#1596222f-a243-4dc3-b9fe-a9ec03ad9adb: OK
    blockName: /SpecialZeroBias2/Run2024I-SiStripCalZeroBias-PromptReco-v1/ALCARECO#a8b97986-e1e1-4d7a-9b0f-4ae31589c714: OK
    blockName: /EGamma1/Run2024I-PromptReco-v1/MINIAOD#ee18201c-58ce-42a7-a60b-d1364ca32653: OK
    blockName: /ParkingVBF5/Run2024I-PromptReco-v1/NANOAOD#36beaf6c-97b1-475b-aba4-e9e7e8e694c3: OK
    blockName: /StreamExpressCosmics/Run2024I-PromptCalibProdSiPixelLAMCS-Express-v1/ALCAPROMPT#99955f16-752e-4dd0-ad52-d6eaf8d0f509: OK
    blockName: /Muon1/Run2024I-SiPixelCalSingleMuonLoose-PromptReco-v1/ALCARECO#14d95293-a39a-4d2a-a1f1-956affdde47c: OK
    blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/MINIAOD#4223ff94-a955-424e-b245-13c337e5b17d: OK
    blockName: /Muon1/Run2024I-EXODisappMuon-PromptReco-v1/USER#55154291-7763-457d-a791-96c65ef849ea: OK
    blockName: /Muon1/Run2024I-MUOJME-PromptReco-v1/RAW-RECO#3b55bfa9-ba65-4454-bc98-e60db1916b28: OK
    blockName: /ParkingSingleMuon9/Run2024I-PromptReco-v1/NANOAOD#439d8642-eccc-4e50-85ec-ce0a084b7fee: OK
    blockName: /Muon1/Run2024I-MuAlCalIsolatedMu-PromptReco-v1/ALCARECO#b9479dcc-06fb-4dce-8b11-d3cb0e957900: OK
    blockName: /Muon1/Run2024I-TkAlMuonIsolated-PromptReco-v1/ALCARECO#56f3de3c-f985-466c-b405-36fb5ff57720: OK
    blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/AOD#04da00bb-4cec-4b87-a1d9-0747a3d4f02e: OK
    blockName: /Muon1/Run2024I-EXODisappTrk-PromptReco-v1/USER#f3e359cc-cb16-4020-8e84-4f083d1e9441: OK
    blockName: /JetMET0/Run2024I-JetHTJetPlusHOFilter-PromptReco-v1/RAW-RECO#13eb60ae-a6e7-4a47-b856-0f4f4e6e00d8: OK
    blockName: /Muon0/Run2024I-MUOJME-PromptReco-v1/RAW-RECO#a7967f2a-1035-4aea-afa0-26b9898938ce: OK
    blockName: /Muon0/Run2024I-ZMu-PromptReco-v1/RAW-RECO#966a0503-91c2-4e38-a002-bfac712cb168: OK
    blockName: /ZeroBias/Run2024I-SiStripCalMinBias-PromptReco-v1/ALCARECO#685c6155-a526-4509-bce9-812330416777: OK
    blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/MINIAOD#3b1b2f8c-a7da-4e53-902f-b4fa265774d2: OK
    blockName: /Muon1/Run2024I-PromptReco-v1/AOD#4f5fab34-198f-4eb0-81a4-09fc0c1a501c: OK
    blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/DQMIO#0fdb3015-38e1-47ef-bb71-237e7fbb1f08: OK
    blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/MINIAOD#ab102ad0-6081-42b6-9afe-9df7746af1d9: OK
    blockName: /Muon1/Run2024I-PromptReco-v1/MINIAOD#4adf338f-e98a-4e15-8e7f-a075cafbf918: OK
    blockName: /Muon1/Run2024I-HcalCalIterativePhiSym-PromptReco-v1/ALCARECO#a794ba12-a0d8-4c78-9d07-9b977908cb1f: OK
    blockName: /ParkingSingleMuon8/Run2024I-PromptReco-v1/NANOAOD#f594d1ee-148d-4b50-870b-a2f66c51efec: OK
    blockName: /ParkingSingleMuon9/Run2024I-PromptReco-v1/AOD#c60db50a-6e2e-4054-82d0-5d2f2622dd85: OK
    blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/MINIAOD#9fab21d5-31e2-426a-aa0d-5ae91119d468: OK
    blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/MINIAOD#ca5f6304-70f8-4504-b885-75bb293adf69: OK
    blockName: /ParkingSingleMuon3/Run2024I-PromptReco-v1/NANOAOD#7e049c44-3e1b-4e9d-92aa-8aa5a70db114: OK
    blockName: /Muon1/Run2024I-LogError-PromptReco-v1/RAW-RECO#65903046-1775-44ea-94bc-dfba5746ed0e: OK
    blockName: /Muon1/Run2024I-ZMu-PromptReco-v1/RAW-RECO#1fb3d4ad-ca60-4e64-a077-1437accb0f57: OK
    blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/NANOAOD#b83a6f99-b768-4588-8577-18436f17bd0c: OK
    blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/AOD#41bdafe6-cdc1-40e1-9edc-85ef3d18e0ae: OK
    blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/MINIAOD#b8ca91f4-608a-4612-8eeb-df6183a7b99f: OK
    blockName: /ParkingSingleMuon6/Run2024I-PromptReco-v1/MINIAOD#9b909863-233e-480c-9e3f-3850c0b67167: OK
    blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/AOD#b9849323-cc1a-445a-817f-c4e3212f74a2: OK
    blockName: /ParkingDoubleMuonLowMass7/Run2024I-PromptReco-v1/AOD#c0047424-6a0a-41c6-94ac-28aa41129d71: OK
    blockName: /ParkingVBF6/Run2024I-PromptReco-v1/AOD#d37b0146-4466-4ba8-a1ba-85c1ef4a902e: OK
    blockName: /Muon1/Run2024I-LogErrorMonitor-PromptReco-v1/USER#63a0eea8-35a0-4b70-adb5-39a6141c7bde: OK
    blockName: /ParkingVBF2/Run2024I-PromptReco-v1/NANOAOD#85077ea4-7bbd-45f7-a2df-2608128abe31: OK
    blockName: /ParkingVBF0/Run2024I-PromptReco-v1/AOD#8738e434-abdc-4d03-a60d-358ab2412188: OK
    blockName: /JetMET1/Run2024I-EXOSoftDisplacedVertices-PromptReco-v1/AOD#c2dd6617-dd8e-4bd8-9bd7-cdd81a858c36: OK
    blockName: /ParkingSingleMuon9/Run2024I-PromptReco-v1/MINIAOD#78af0c2f-e670-46e5-832a-d3828066fca7: OK
    blockName: /ParkingSingleMuon7/Run2024I-PromptReco-v1/AOD#f81d407a-8165-41c9-8a11-a5182d63d273: OK
    blockName: /Muon0/Run2024I-LogErrorMonitor-PromptReco-v1/USER#d7c8e5cc-0e23-424f-b000-55aa41070d63: OK
    blockName: /EphemeralZeroBias0/Run2024I-PromptReco-v1/MINIAOD#ad55216f-8224-404d-b8e9-daba73c85bb4: OK
    blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/NANOAOD#f590565c-1256-4f74-8e00-db331266d599: OK
    blockName: /Muon0/Run2024I-PromptReco-v1/NANOAOD#2ab3aa3d-a4ca-4773-8a58-85813617ea33: OK
    blockName: /Muon1/Run2024I-HcalCalHBHEMuonProducerFilter-PromptReco-v1/ALCARECO#455e3ec5-62d7-4f68-abbe-a34111a87076: OK
    blockName: /JetMET1/Run2024I-LogError-PromptReco-v1/RAW-RECO#1a77079a-29d9-4920-834c-eb523aeea080: OK
    blockName: /Muon0/Run2024I-EXODisappMuon-PromptReco-v1/USER#474dc48d-caf5-44d3-9a80-dcc2d5eec561: OK
    blockName: /JetMET1/Run2024I-JetHTJetPlusHOFilter-PromptReco-v1/RAW-RECO#f965d641-1a87-46b3-87a8-9d2a566fc604: OK
    blockName: /ParkingSingleMuon6/Run2024I-PromptReco-v1/NANOAOD#33455ccc-7ce1-4fb5-aa69-2048f8362f27: OK
    blockName: /ParkingSingleMuon10/Run2024I-PromptReco-v1/NANOAOD#fe088890-8bc3-416c-b263-408703b5efa4: OK
    blockName: /ParkingSingleMuon10/Run2024I-PromptReco-v1/AOD#df90c976-3437-46ee-af3e-fbc2221e48f4: OK
    blockName: /ParkingDoubleMuonLowMass6/Run2024I-PromptReco-v1/DQMIO#fe3a6889-bdb5-4959-a164-2db208d3e69b: OK
    blockName: /ParkingSingleMuon10/Run2024I-PromptReco-v1/MINIAOD#71008299-7fb4-4245-90d2-e980b9e195a1: OK
    blockName: /ParkingVBF2/Run2024I-PromptReco-v1/MINIAOD#7e7cb65b-c8db-4e36-b9a1-3475a031dedd: OK
    blockName: /AlCaP0/Run2024I-v1/RAW#0d3bd409-bcd8-4c59-bcd8-d7ecc3a1222b: OK
    blockName: /ParkingVBF3/Run2024I-PromptReco-v1/MINIAOD#2a6f15da-bd5c-4520-b70b-ab50ff65e04e: OK
    blockName: /ParkingSingleMuon4/Run2024I-PromptReco-v1/NANOAOD#f7f3e422-3bd6-4410-a955-9ed339b7219f: OK
    blockName: /ParkingVBF2/Run2024I-PromptReco-v1/AOD#ebaafffd-4020-46cf-9137-37c2832d3eac: OK
    blockName: /JetMET1/Run2024I-EXOMONOPOLE-PromptReco-v1/USER#77eabaa1-ead3-4534-a972-e788c3e7f050: OK
    blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/NANOAOD#8e5664a8-e3d1-4f92-b27e-3707ca3dc4df: OK
    blockName: /Muon0/Run2024I-EXOCSCCluster-PromptReco-v1/USER#700a9b24-dbc4-4bf9-8787-01da5ef26a06: OK
    blockName: /Muon0/Run2024I-LogError-PromptReco-v1/RAW-RECO#b5af0d19-9710-4757-af04-ba6f63ab4070: OK
    blockName: /ScoutingPFRun3/Run2024I-v1/HLTSCOUT#da913315-bb66-42ed-8c46-4f7f3714ef0c: OK
    blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/AOD#3d339afb-6be9-440d-802e-3572ab355d56: OK
    blockName: /ParkingDoubleMuonLowMass1/Run2024I-PromptReco-v1/NANOAOD#1c60e700-caf6-469a-9bb7-40122038ed33: OK
    blockName: /JetMET1/Run2024I-PromptReco-v1/MINIAOD#af8b556a-6dfa-43eb-890a-7f0cdea01f87: OK
    blockName: /JetMET1/Run2024I-EXOHighMET-PromptReco-v1/RAW-RECO#f09b2a46-61f1-4ae0-ac58-17d8c7db4fe3: OK
    blockName: /ParkingVBF3/Run2024I-PromptReco-v1/AOD#8a47209b-3d7a-4c13-a3d7-8a34397a9f94: OK
    blockName: /ScoutingPFRun3/Run2024I-PromptReco-v1/NANOAOD#2d01f47e-5f60-4115-b99a-3ccd5f843a71: OK
    blockName: /EphemeralZeroBias5/Run2024I-PromptReco-v1/MINIAOD#65830620-4305-4f9d-b084-555c17dc5610: OK
    blockName: /ParkingHH/Run2024I-PromptReco-v1/AOD#47f79776-ae5b-41ac-9965-d4981b9790d6: OK
    blockName: /ZeroBias/Run2024I-PromptReco-v1/AOD#d46e1e69-d0e3-4456-9bf1-060eb3731aec: OK
    blockName: /ParkingVBF0/Run2024I-PromptReco-v1/NANOAOD#40dc6115-2b40-43be-9e4e-cdd658e68dc7: OK
    blockName: /ParkingSingleMuon11/Run2024I-PromptReco-v1/AOD#0410c654-f369-40f4-858b-af27bbe4d94d: OK
    blockName: /JetMET0/Run2024I-PromptReco-v1/AOD#dc00eaf3-8f2c-4465-ba70-78335e4cb245: OK
    blockName: /ParkingDoubleMuonLowMass0/Run2024I-PromptReco-v1/NANOAOD#2c9e2ac2-b2aa-4868-9782-556e6193cbbb: OK
    blockName: /ParkingDoubleMuonLowMass5/Run2024I-PromptReco-v1/MINIAOD#3957f41c-be2a-4f09-b04c-6995d7c23eee: OK
todor-ivanov commented 1 month ago

@germanfgv @LinaresToine Could we check what is special for those 4 blocks reported as experiencing BLOCKMISMATCH records at DBS in my previous comment: https://github.com/dmwm/WMCore/issues/11965#issuecomment-2454961698 . I am interested to find out at least:

amaltaro commented 1 month ago

@todor-ivanov as we discussed in the WMCore meeting, DBS3Upload should have a mechanism to identify blocks that have already been injected into DBS Server, but failed to acknowledge the operation for some reason.

If the component tries to inject a block already in the server, it is supposed to return exit code 128, marking the block as check here: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L104

which will trigger the execution of this block of code: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L848

in the next cycle of the component.

I don't think anything changed on the DBS Server codebase lately, so I expect this feature to be still functional. But you might want to revise the error message/code that we are getting for the problematic blocks.

amaltaro commented 4 weeks ago

@todor-ivanov we are having similar problems with one agent that is ready to be shutdown (after draining), but it still has one block that it fails to inject into DBS Server.

Could you please look into submit12 and try to understand what the problem is with:

2024-11-04 21:38:50,968:140011552839424:INFO:DBSUploadPoller:About to call insert block for: /XToYYprimeTo4Q_MX-2000_MY-30_MYprime-600_narrow_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM#fb282932-f32b-48de-82e5-a56cceb34cad
2024-11-04 21:38:51,654:140011552839424:ERROR:DBSUploadPoller:Error trying to process block /XToYYprimeTo4Q_MX-2000_MY-30_MYprime-600_narrow_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM#fb282932-f32b-48de-82e5-a56cceb34cad through DBS. Details: DBSError code: 131, message: fb20d909d3a86926e3d8d0498c1ebfc3f4ad617c6b5e5dcaeecde3662af8797b unable to find dataset_id for /XToYYprimeTo4Q_MX-2000_MY-30_MYprime-600_narrow_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM, error DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set, reason: DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set

It seems to be failing injection since Oct 15th.

germanfgv commented 4 weeks ago
  • Almost all of those blocks are properly present at DBS - so for those I assume that the Agent did not properly handled the initial return code by DBS and it simply continues to retry.

Thanks @todor-ivanov! tested this using my own script (DBSBlockCheck.py) and got the same result. All but 4 blocks are already available in DBS.

All the blocks listed in /eos/home-c/cmst0/public/dbsError/failingBlocks.txt belong to the same agent, including the 4 problematic ones.

I can check @amaltaro's idea tomorrow.

todor-ivanov commented 4 weeks ago

In reality the error code the agent unwraps from the HTTP header for some reason is 52 instead of 128 see [1]. So thie mechanism mentioned here: https://github.com/dmwm/WMCore/issues/11965#issuecomment-2455173659 will never trigger.

[1]

2024-11-05 10:07:49,084:139753276044864:ERROR:DBSUploadPoller:Hit a general exception while inserting block /Tau/Run2024I-PromptReco-v1/DQMIO#1a0ac20a-1d60-4d89-8133-e8559f1e4c13. Error: (52, 'Empty reply from server')
Traceback (most recent call last):
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/WMComponent/DBS3Buffer/DBSUploadPoller.py", line 94, in uploadWorker
    dbsApi.insertBulkBlock(blockDump=block)
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 647, in insertBulkBlock
    result =  self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py", line 474, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RestApi.py", line 42, in post
    return http_request(self._curl)
  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 56, in __call__
    curl_object.perform()
pycurl.error: (52, 'Empty reply from server')
todor-ivanov commented 4 weeks ago

Actually it never tries to read the HTTP header and to actually resolve the true DBS error, which is supposed to be done through the dbsError class here: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L102

And the reason why it happens like that, is obviously, because the error returned by the pycurl client is not of type HTTPError. So this whole piece of code there is never tried: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L96-L115

But instead the exception is handled as a general exception and this one is taking the control: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L116-L119

It has to have something to do with this line from the traceback:

  File "/data/tier0/WMAgent.venv3/lib64/python3.9/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 56, in __call__
    curl_object.perform()
todor-ivanov commented 4 weeks ago

And just to add to the observation: The 4 blocks which I mentioned are experiencing BLOCKMISMATCH at DBS, behave differently. They fail with a proper DBS exception [1]. All 4 of them. And it is indeed the concuerrency error - DBSError Code:110. So for them the actual HTTP Header is indeed parsed and the true DBS Error encoded into it is received, so the exception is handled according to whatever logic is meant to be implemented by: https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L96-L115

But as we can see DBS ErrorCode: 110 is not handled at this logic. So I suspect the conversation on how to proceed about these cases needs to continue once we understand what exactly has happened with those 4 blocks at the first place. It doesn't seem that a proper agreement has been achieved on the actions required on both - the client and the server side for situations like that.

[1]

2024-09-20 08:43:10,755:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd through DBS. Details: DBSError code: 110, message: 5ecdc2bdcd03492fd64efc269de332cdc
f1c8a53c3e3cc07168b0c741f0270ba unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:
DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
2024-09-20 08:43:10,756:139632874354240:INFO:DBSUploadPoller:About to call insert block for: /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7
2024-09-20 08:43:10,757:139632874354240:INFO:DBSUploadPoller:Queueing block for insertion: /L1ScoutingSelection/Run2024H-v1/L1SCOUT#694f9058-382e-47d9-89cd-646541261cd7
2024-09-20 08:43:10,760:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 through DBS. Details: DBSError code: 110, message: 997071d9311e283887ce5e57b0b180046
7986e1c57f620aff5a39d98b881fb6c unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:
DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
...

2024-09-20 08:43:10,799:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7 through DBS. Details: DBSError code: 110, message: d93d36f53eaf3097db5c9f50851359041c418a18727e6f363e6c18c37d3f25bb unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error

...
2024-09-20 08:43:11,854:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /Muon0/Run2024H-v1/RAW#7369ccdf-3d3a-4d32-bad9-b04b02f279d4 through DBS. Details: DBSError code: 110, message: e38e86de6869760af39faf5da584eceee0b0b9d1de48e57276593df8dd4c720e unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error
germanfgv commented 4 weeks ago

@todor-ivanov actually, at some point, the 4 blocks started failing with the pycurl.error: (52, 'Empty reply from server'), before any other block had failed.

This is the last appearance of DBSError 110:

2024-09-27 14:31:30,711:140408533284416:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd through DBS. Details: DBSError code: 110, message: ec6dab1b1b8d8ba3bf018be816846d73e007b5049b93
947a1e7472786c73ece6 unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record e
rror Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error

This is the first appearance of pyCurl error 52.

2024-09-27 15:03:00,192:140408533284416:ERROR:DBSUploadPoller:Hit a general exception while inserting block /AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd. Error: (52, 'Empty reply from server')

This might be a clue on what caused the other 272 blocks to fail. It seems something change in the DBS server at some point between 2024-09-27 14:31 and 2024-09-27 15:03. After that moment, the client is unable to parse the server's error codes. This exactly coincides with the deployment of the APS-based CMSWEB cluster

@amaltaro @todor-ivanov @vkuznet

germanfgv commented 4 weeks ago

Here you have the DBS3Upload ComponentLog, in case you want to check these dates:

/eos/user/c/cmst0/public/dbsError/ComponentLog
vkuznet commented 4 weeks ago

@germanfgv I would like to mention that according to k8s production dbs cluster we run DBS pods for 209 days. Therefore, nothing has changed on DBS side, and neither I aware of any development, commits/PRs. The concurrency error may seems misleading since it printed out with concurrency call to file injection. But the file injection fails due to missing aux meta-data in JSON payload. Please see these DBS code:

I reported MANY times that most likely issue is with missing file data type in JSON payload, and I strongly suggest to start with your JSON payload and see if it is there. In particular, the files section of payload should contain file_type, see example here.

If JSON payload is correct in terms of ALL required aux meta-data, I suggest that you move down the list and check validity of the file(s) and finally look-up for ORACLE insert error.

germanfgv commented 4 weeks ago

@vkuznet We have 2 separate problems here:

  1. 4 blocks showing concurrency errors.
  2. 272 that are already properly uploaded to DBS, but the agent is unable to parse the response from the server.

I bring up the APS upgrade in reference with the issues parsing the response from the server, not as an explanation for the concurrency issues. After 2024-09-27 15:03, the DBS client is unable to distinguish DBSError 128: Block already exists, from DBSError 110: Concurrency error (Or any HTTP other error). They all show up as a pyCurl error 52. As the timing coincides with the deployment of the APS server, it seems to me very likely the issue is related to that upgrade, specially since, as you mentioned, there have not been any other changes in the code.

I would like to switch this agent temporarily to the cmsweb-prod.cern.ch version of DBSWriter, simply to check if the 272 blocks without concurrency issues can move along. This will not create a bit pressure over the server, as this agent is no longer producing new data, and simply needs to upload those 272 blocks. @vkuznet do you have anything against that plan?

Regarding the JSON payload, the dumps we obtained from the agent show "file_type": "EDM", as expected. This is why we've moved to check the validity of the files and Todor already found issues there. There are indeed files appearing in more than one block. Fixing this will be more complicated and we still need to understand what faulty agent logic caused it.

vkuznet commented 4 weeks ago

My suggestions would be the following:

vkuznet commented 4 weeks ago

@germanfgv , and regarding switching to cmsweb-prod, if you interested to understand the error I suggest to use manual curl approach as I described before. And, afterwards you may switch to cmsweb-prod to see if you'll be able to inject them using Apache FE.

LinaresToine commented 4 weeks ago

About the 4 original blocks with the concurrency errors, specifically the AlCaP0 blocks:

I see all files in DBS, but they are distributed among two "impostor" blocks.

1. `/AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417`
2. `/AlCaP0/Run2024H-v1/RAW#0392f25d-8397-40b3-8f6f-46266d92583b`

I call them impostor because both have blocks that belong to other blocks and number 2 is not even in Rucio and all his files belong to another block according to the database. Here is a summary of all 5 blocks; the 2 impostors and the 3 originals:

/AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417 (impostor 1)

/AlCaP0/Run2024H-v1/RAW#0392f25d-8397-40b3-8f6f-46266d92583b (impostor 2)

/AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 (original)

/AlCaP0/Run2024H-v1/RAW#92fab5d9-9a27-4a7c-a57e-4b2691c654cd (original)

/AlCaP0/Run2024H-v1/RAW#983e96f3-5dca-4919-a3b4-fa291f145fb7 (original)

todor-ivanov commented 4 weeks ago

About:

to do that, you should stop using DBS3Upload code as it hides many things and prevent from debugging the issue

Just to put @vkuznet's words in perspective:

I tried to completely simulate the whole agent environment in preprod connected to DBS integration, falsely assuming everything should go smoothly and upon initial successful upload of the block I'll be able to reproduce the duplication error on a second attempt. But:

https://github.com/dmwm/DBSClient/blob/1e6acbd55c55497cf747a2a0cf4539936138a04a/src/python/dbs/apis/dbsClient.py#L647:

    def insertBulkBlock(self, blockDump):
   ...
        result =  self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )

Which actually contains the true DBS Error in the header and one can spot the error message in the printout:

RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
      Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set                                                                                     

-- So those are two nested DBS errors:

https://github.com/dmwm/WMCore/blob/76fd3a93ab322a897d63a0b54aa7129c8588db16/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L108

And there is a plethora of DBS server errors we do not handle: https://github.com/dmwm/dbs2go/blob/8effd5a6bcb1c5b169348e3ac886891ad3aa1a2a/dbs/errors.go#L37-L81 : [2]

FYI: @germanfgv @LinaresToine

[1]

5643 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(477)__callServer()                                                                                                                                                                                                                           
5644 │-> self.__parseForException(http_error)                                         |                                                                                                                                                                                                                                      
5645 │(Pdb)                                                                           |                                                                                                                                                                                                                                      
5646 │DBS Server error: [{'error': {'reason': 'DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set', 'message': 'unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8df\
      a-1722884306c5.root', 'function': 'dbs.bulkblocks.InsertBulkBlocksConcurrently', 'code': 101, 'stacktrace': '\ngoroutine 7968287 [running]:\ngithub.com/dmwm/dbs2go/dbs.Error({0xb2ca20?, 0xc0006842d0?}, 0x65, {0xc000702000, 0x84}, {0xa5c044, 0x2b})\n\t/go/src/github.com/vkuznet/dbs2go/dbs/errors.go:185 +0x99\n\
      github.com/dmwm/dbs2go/dbs.(*API).InsertBulkBlocksConcurrently(0xc000236070)\n\t/go/src/github.com/vkuznet/dbs2go/dbs/bulkblocks2.go:508 +0x605\ngithub.com/dmwm/dbs2go/web.DBSPostHandler({0xb2f790, 0xc000aa01e0}, 0xc000686c60, {0xa3e07d, 0xa})\n\t/go/src/github.com/vkuznet/dbs2go/web/handlers.go:562 +0x109e\n\
      github.com/dmwm/dbs2go/web.BulkBlocksHandler({0xb2f790?, 0xc000aa01e0?}, 0xc000033f60?)\n\t/go/src/github.com/vkuznet/dbs2go/web/handlers.go:978 +0x3b\nnet/http.HandlerFunc.ServeHTTP(0x0?, {0xb2f790?, 0xc000aa01e0?}, 0x11?)\n\t/usr/local/go/src/net/http/server.go:2171 +0x29\ngithub.com/dmwm/dbs2go/web.limitMi\
      ddleware.func1({0xb2f790?, 0xc000aa01e0?}, 0xc0006c6650?)\n\t/go/src/github.com/vkuznet/dbs2go/web/middlewares.go:110 +0x32\nnet/http.HandlerFunc.ServeHTTP(0xc0003c0f30?, {0xb2f790?, 0xc000aa01e0?}, 0xc0003af450?)\n\t/usr/loca'}, 'http': {'method': 'POST', 'code': 400, 'timestamp': '2024-11-06 16:16:23.350982\
      889 +0000 UTC m=+5760929.544914892', 'path': '/dbs/int/global/DBSWriter/bulkblocks', 'user_agent': 'DBSClient/Unknown/', 'x_forwarded_host': 'cmsweb-testbed.cern.ch', 'x_forwarded_for': '188.184.96.94:20438, 188.184.96.94', 'remote_addr': '10.100.148.128:41393'}, 'exception': 400, 'type': 'HTTPError', 'messag\
      e': 'DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root Error: nested DBSError Code:103 Description:DBS DB query error, e.g.\
       mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set'}]                                                                                                                                                                                                                             
5647 │RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
      Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set                                                                                                                                                            
5648 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(477)__callServer()                                                                                                                                                                                                                           
5649 │-> self.__parseForException(http_error)                                         |                                                                                                                                                                                                                                      
5650 │(Pdb)                                                                           |                                                                                                                                                                                                                                      
5651 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(486)__callServer()                                                                                                                                                                                                                           
5652 │-> self.__parseForException(data)                                               |                                                                                                                                                                                                                                      
5653 │(Pdb)                                                                           |                                                                                                                                                                                                                                      
5654 │--Return--                                                                      |                                                                                                                                                                                                                                      
5655 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(486)__callServer()->None                                                                                                                                                                                                                     
5656 │-> self.__parseForException(data)                                               |                                                                                                                                                                                                                                      
5657 │(Pdb)                                                                           |                                                                                                                                                                                                                                      
5658 │RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
      Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set                                                                                                                                                            
5659 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(647)insertBulkBlock()                                                                                                                                                                                                                        
5660 │-> result =  self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )|                                                                                                                                                                                                                                      
5661 │(Pdb) p result                                                                  |                                                                                                                                                                                                                                      
5662 │*** NameError: name 'result' is not defined                                     |                                                                                                                                                                                                                                      
5663 │(Pdb) n                                                                         |                                                                                                                                                                                                                                      
5664 │--Return--                                                                      |                                                                                                                                                                                                                                      
5665 │> /data/WMAgent.venv3/lib64/python3.9/site-packages/dbs/apis/dbsClient.py(647)insertBulkBlock()->None                                                                                                                                                                                                                  
5666 │-> result =  self.__callServer("bulkblocks", data=blockDump, callmethod='POST' )|                                                                                                                                                                                                                                      
5667 │(Pdb) p result                                                                  |                                                                                                                                                                                                                                      
5668 │*** NameError: name 'result' is not defined                                     |                                                                                                                                                                                                                                      
5669 │(Pdb) n                                                                         |                                                                                                                                                                                                                                      
5670 │RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:unable to find parent lfn /store/data/Run2024I/ParkingSingleMuon4/RAW/v1/000/386/640/00000/7c1b6c7b-a0bf-4f19-8dfa-1722884306c5.root \
      Error: nested DBSError Code:103 Description:DBS DB query error, e.g. mailformed SQL statement Function:dbs.GetID Message: Error: sql: no rows in result set                                                                                                                                                            
5671 │> /data/WMAgent.venv3/srv/WMCore/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py(94)uploadWorker()                                                                                                                                                                                                                
5672 │-> dbsApi.insertBulkBlock(blockDump=block)                                      |                                                                                                                                                                                                                                      
5673 │(Pdb)                                                                           |                                                                                                                                                                                                                                      
5674 │> /data/WMAgent.venv3/srv/WMCore/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py(96)uploadWorker()                                                                                                                                                                                                                
5675 │-> except HTTPError as ex:                                                      |                                                                                                                                                                                                                                      
5676 │(Pdb)                                                                                                                                                                                           

[2]

// DBS Error codes provides static representation of DBS errors, they cover 1xx range
const (
    GenericErrorCode               = iota + 100 // generic DBS error
    DatabaseErrorCode                           // 101 database error
    TransactionErrorCode                        // 102 transaction error
    QueryErrorCode                              // 103 query error
    RowsScanErrorCode                           // 104 row scan error
    SessionErrorCode                            // 105 db session error
    CommitErrorCode                             // 106 db commit error
    ParseErrorCode                              // 107 parser error
    LoadErrorCode                               // 108 loading error, e.g. load template
    GetIDErrorCode                              // 109 get id db error
    InsertErrorCode                             // 110 db insert error
    UpdateErrorCode                             // 111 update error
    LastInsertErrorCode                         // 112 db last insert error
    ValidateErrorCode                           // 113 validation error
    PatternErrorCode                            // 114 pattern error
    DecodeErrorCode                             // 115 decode error
    EncodeErrorCode                             // 116 encode error
    ContentTypeErrorCode                        // 117 content type error
    ParametersErrorCode                         // 118 parameters error
    NotImplementedApiCode                       // 119 not implemented API error
    ReaderErrorCode                             // 120 io reader error
    WriterErrorCode                             // 121 io writer error
    UnmarshalErrorCode                          // 122 json unmarshal error
    MarshalErrorCode                            // 123 marshal error
    HttpRequestErrorCode                        // 124 HTTP request error
    MigrationErrorCode                          // 125 Migration error
    RemoveErrorCode                             // 126 remove error
    InvalidRequestErrorCode                     // 127 invalid request error
    BlockAlreadyExists                          // 128 block xxx already exists in DBS
    FileDataTypesDoesNotExist                   // 129 FileDataTypes does not exist in DBS
    FileParentDoesNotExist                      // 130 FileParent does not exist in DBS
    DatasetParentDoesNotExist                   // 131 DatasetParent does not exist in DBS
    ProcessedDatasetDoesNotExist                // 132 ProcessedDataset does not exist in DBS
    PrimaryDatasetTypeDoesNotExist              // 133 PrimaryDatasetType does not exist in DBS
    PrimaryDatasetDoesNotExist                  // 134 PrimaryDataset does not exist in DBS
    ProcessingEraDoesNotExist                   // 135 ProcessingEra does not exist in DBS
    AcquisitionEraDoesNotExist                  // 136 AcquisitionEra does not exist in DBS
    DataTierDoesNotExist                        // 137 DataTier does not exist in DBS
    PhysicsGroupDoesNotExist                    // 138 PhysicsGroup does not exist in DBS
    DatasetAccessTypeDoesNotExist               // 139 DatasetAccessType does not exist in DBS
    DatasetDoesNotExist                         // 140 Dataset does not exist in DBS
    LastAvailableErrorCode                      // last available DBS error code
)
germanfgv commented 3 weeks ago

I changed the DBSWriter instance that the component is accessing from cmsweb.cern.ch to cmsweb-prod.cern.ch. As expected, we no longer get the pyCurl error 52 message. The 272 blocks that are already correct in the database were processed without issues, and this is allowing the agent to continue creating and uploading blocks.

Now we are left with the original 4 problematic blocks.

todor-ivanov commented 3 weeks ago

Here to summarize the status and our findings about this issue from the work with T0 Team for the whole last week

The problem is 3 fold:

The above two are concerning mostly the huge pile of blocks which we were accumulating and not recognizing that their records were already in DBS, such that the agent should stop retrying. Once we switched back to the APache frontend all those proceeded, and the sequential steps for the other workflows depending on the data also started.

As a strategy we decided to split the problem in 5 steps: 2 OPS and 3 DEV

(the later never imagined even exists)

so:

amaltaro commented 3 weeks ago

Thank you for summarizing everything that has been going on in here.

For the OPS2 issue above, I find deleting entries from the DBS Server database extremely dangerous. Even though it might require extra work, it would be much safer to actually recreate the lumis (or block) that is failing to get inserted into DBS. Did you and the T0 discuss this possibility? @germanfgv

About the DEV1, unless I am missing some context, I do not think we should replicate every single status code from the DBS Server to the client side. IMO, the client should only deal with the status code that it can actually do something different. If there is no different execution flow, then reporting the error from the server is what we can do (which is already done in the generic exception AFAICT).

germanfgv commented 3 weeks ago

@amaltaro we no longer have streamer files for these run/lumis, it's not possible to recreate these blocks.

We could consider making the changes in Rucio, but it would require to remove files from one block and add it to the other. Also, it would require to do the same in the agent's DBSBUFFER database.

amaltaro commented 3 weeks ago

Given the criticality and amount of information in DBS, it would be the last system that I would delete things manually. For dbsbuffer, do I understand it right that we would only need to mark this block and its files as uploaded to DBS? For Rucio, what would have to be done? Remove files/replicas from a DATASET? Would it need creation of a new DATASET + files/replicas?

germanfgv commented 3 weeks ago

In Rucio, we would need to remove 4 files from one block and add them to another. In the agent's database, we would need to change the block of the 4 problematic files and mark them as InDBS. I'm not sure how Rucio would reack to this, but I think it will be ok, as all files belong to the same container

amaltaro commented 2 weeks ago

After some discussions during the Tier0 meeting, I decided to have a quick look at the logs to see if we can have a better understanding of this issue.

I don't see some information in this thread, so let me write my observations here: 1) before the problematic blocks have been created in DBS3Upload, the component had a few oracle issues like:

Exception Class: DBSUploadException
Message: Unhandled exception while loading uploadable files for DatasetPath.
(cx_Oracle.DatabaseError) ORA-25401: can not continue fetches

2) after these oracle issues, I noticed many files being reported as duplicated in the logs:

2024-09-19 17:23:01,916:139632874354240:INFO:DBSUploadPoller:Executing loadFiles method...
2024-09-19 17:23:11,876:139632874354240:ERROR:DBSBufferBlock:Duplicate file inserted into DBSBufferBlock: 1077894
Ignoring this file for now!

3) based on Antonio's feedback above, the "impostor block" had the following timeline in the component:

### impostor block 1
2024-09-19 17:51:38,694:139632874354240:INFO:DBSUploadPoller:Queueing block for insertion: /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
2024-09-19 17:52:47,723:139632874354240:INFO:DBSUploadPoller:About to call insert block for: /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417

4) while the original block had this timeline (and kept failing since then)

2024-09-19 17:51:38,698:139632874354240:INFO:DBSUploadPoller:Queueing block for insertion: /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4
2024-09-19 17:52:49,777:139632874354240:ERROR:DBSUploadPoller:Error trying to process block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 through DBS. Details: DBSError code: 110, message: 997071d9311e283887ce5e57b0b1800467986e1c57f620aff5a39d98b881fb6c unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error

5) looking into RucioInjector, these 2 blocks above had the following timeline:

### impostor block 1
2024-09-19 15:30:19,570:139632735942208:INFO:RucioInjectorPoller:Block /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417 inserted into Rucio
2024-09-19 15:30:29,385:139632735942208:INFO:RucioInjectorPoller:Successfully inserted 4 files on block /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
2024-09-19 17:57:20,982:139632735942208:INFO:RucioInjectorPoller:Closing block: /AlCaP0/Run2024H-v1/RAW#b51293e3-1563-47a3-a88f-1eb33790c417
### original block 1
2024-09-19 17:56:12,439:139632735942208:INFO:RucioInjectorPoller:Block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4 inserted into Rucio
2024-09-19 17:56:41,372:139632735942208:INFO:RucioInjectorPoller:Successfully inserted 5 files on block /AlCaP0/Run2024H-v1/RAW#3bbaf481-068c-4fda-8656-663fa9a987a4

Having said that, I have the following questions/comments: 1) it looks like we have not closed the original blocks in Rucio. AFAIK it is not a big deal and it has no impact in anything else. It is, nonetheless, different than any other block created by WMAgent. 2) is it possible that the list of files returned from dbsbuffer was not unique? File id is supposed to be unique (and sequential, AFAICT). How about lfns, do we have the same lfn under different file ids? Otherwise, how would we iterate through the same fileid twice?

Without investigating the codebase too much, it is possible that those duplicate file ids ("Ignoring this file for now!") actually triggered the misbehavior of the component. This duplicate file id is identified here: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSBufferBlock.py#L105 and one of the places it is used (there is another in the same module) is in this block: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py#L487

vkuznet commented 2 weeks ago

@amaltaro , few observations:

Is it possible that thread was killed because ORACLE timed out? Or, if connection was lost to ORACLE and error was thrown. How DBSUploadPoller.py guarantees that transactions will be rolled back if thread is killed? From what I read in a code nothing is protected for such use-case and transaction will not be rolled back if thread is killed. It may explain the weird behavior.

In other words, because of the polling cycle, if thread is killed for whatever reason there is no guarantee that transaction can be rolled back in Python. But polling cycle will start poller again and it may execute the same injection of objects into database which may not be protected (if there is no UNIQUE constrain on a injected object), and it may explain the observed behavior.