dmwm / dbs2go

DBS server written in Go
MIT License
5 stars 4 forks source link

Allow numbers in the data_tier_name regular expression #108

Open todor-ivanov opened 4 months ago

todor-ivanov commented 4 months ago

This issue is a followup of https://github.com/dmwm/WMCore/issues/11931 tracking the change on the DBS side.

todor-ivanov commented 3 months ago

Ok now,

Since we have accumulated some steps already working in iterations with @germanfgv on that issue in order to provide this functionality for the T0 replays, I'd like to put some activity records here:

$ grep lexicon /etc/secrets/dbsconfig.json
    "lexicon_file": "/etc/secrets/lexicon.json"

&{BLOCK_ID:32917742 BLOCK_NAME:/HLTPhysics/Tier0_REPLAY_2024-PromptReco-v28184557/AOD#aaecec0b-d16d-4c6e-9027-5920685dc53e DATASET_ID:14839867 OPEN_FOR_WRITING:0 ORIGIN_SITE_NAME:T0_CHCERN Disk BLOCK_SIZE:2079405 FILE_COUNT:1 CREATION_DATE:1711650349 CREATE_BY:T0Replay LAST_MODIFICATION_DATE:1711650349 LAST_MODIFIED_BY:T0Replay} [2024-04-03 00:01:17.853299409 +0000 UTC m=+456728.832936205] files.go:312: unable to insert files, error ORA-00001: unique constraint (CMS_DBS3_K8S_GLOBAL_OWNER.TUC_FL_LOGICAL_FILE_NAME) vi olated

* Strangely enough, after 24 hours all dataset records were injected, modulo one, and the only error we were  left with, was related to one particular block: `/L1Accept/Tier0_REPLAY_2024-v3182029/L1SCOUT#3a331615-f7c9-4823-91c9-c7a6bae15aac`. Log records from the agent side were:

2024-04-04 12:30:50,017:140283221800704:INFO:DBSUploadPoller:Executing retrieveBlocks method... 2024-04-04 12:30:50,018:140283221800704:INFO:DBSUploadPoller:About to call insert block for: /Cosmics/Tier0_REPLAY_2024-PromptReco-v3182029/DQMIO#38d97107-0bd6-4a61-881c-9bd08afb101b 2024-04-04 12:30:50,018:140283221800704:INFO:DBSUploadPoller:About to call insert block for: /Cosmics/Tier0_REPLAY_2024-TkAlCosmics0T-PromptReco-v3182029/ALCARECO#1bbe16a4-bcad-4f92-928b-a4fdad226bfa 2024-04-04 12:30:50,020:140283221800704:INFO:DBSUploadPoller:About to call insert block for: /MinimumBias/Tier0_REPLAY_2024-SiStripCalZeroBias-PromptReco-v3182029/ALCARECO#219c8a0f-3208-4314-86b6-092e8289c16e 2024-04-04 12:30:50,179:140283221800704:ERROR:DBSUploadPoller:Error trying to process block /L1Accept/Tier0_REPLAY_2024-v3182029/L1SCOUT#3a331615-f7c9-4823-91c9-c7a6bae15aac through DBS. Details: DBSError code: 110, message: 6d0f69d1208bcc4d2f65cdc5abafe8c5196e64d1e1d695a534 8eb382403935a2 unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error, reason: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.inse rtFilesViaChunks Message: Error: concurrency error 2024-04-04 12:30:50,180:140283221800704:INFO:DBSUploadPoller:About to call insert block for: /NoBPTX/Tier0_REPLAY_2024-PromptReco-v3182029/NANOAOD#88aaca0d-5912-4a6d-a7bf-174374986b43 2024-04-04 12:30:50,339:140283221800704:INFO:DBSUploadPoller:About to call insert block for: /NoBPTX/Tier0_REPLAY_2024-LogErrorMonitor-PromptReco-v3182029/USER#c2a880cb-5d18-4839-ae85-5c197501bf66 ...

2024-04-04 12:30:51,584:140283221800704:INFO:DBSUploadPoller:About to call insert block for: /NoBPTX/Tier0_REPLAY_2024-PromptReco-v3182029/DQMIO#66eb027f-bfe9-4a8e-b37b-cdbb07823043 2024-04-04 12:30:51,658:140283221800704:INFO:DBSUploadPoller:Updating database record with files injected into DBS 2024-04-04 12:30:51,663:140283221800704:INFO:DBSUploadPoller:Updating 40 blocks successfully injected into DBS


* Searching for the errors on this particular block, I found that this particular file injection fails from this piece of code:
https://github.com/dmwm/dbs2go/blob/28e02bde209af797af7d59d4f7e3baba25a98605/dbs/files.go#L280 
and then later, the field value is validated here https://github.com/dmwm/dbs2go/blob/28e02bde209af797af7d59d4f7e3baba25a98605/dbs/files.go#L324C26-L324C43 and if one goes and checks how it is validated ... here: https://github.com/dmwm/dbs2go/blob/28e02bde209af797af7d59d4f7e3baba25a98605/dbs/validator.go#L359 it becomes immediately obvious that the actual field consulted for validating the file level injections is  not taken/constructed from the `data_tier` ... and other higher level data fields definitions, but rather directly as a separately defined one called `logical_file_name`, which is taken from here:   https://gitlab.cern.ch/cmsweb-k8s/services_config/-/blob/db98113bab95485c6b5f1ec96002e1ec00c7bae3/dbs2go-global-m/lexicon.json#L3-15 

* The error that I observed on the server side which led me to that conclusion was: 

[2024-04-05 00:03:18.702591251 +0000 UTC m=+629646.970417431] dataset_output_mod_configs.go:68: Insert DatasetOutputModConfigs INSERT INTO cms_dbs3_k8s_global_owner.DATASET_OUTPUT_MOD_CONFIGS (ds_output_mod_conf_id,dataset_id,output_mod_config_id) VALUES (:ds_output_mod_conf_id,:dataset_id,:output_mod_config_id)

&{DS_OUTPUT_MOD_CONF_ID:38366926 DATASET_ID:14839992 OUTPUT_MOD_CONFIG_ID:25163930} [2024-04-05 00:03:18.704100146 +0000 UTC m=+629646.971926341] dataset_output_mod_configs.go:72: unable to insert DatasetOutputModConfigs [2024-04-05 00:03:18.706504728 +0000 UTC m=+629646.974331024] blocks.go:181: Insert Blocks INSERT INTO cms_dbs3_k8s_global_owner.BLOCKS (block_id,block_name,dataset_id, open_for_writing,origin_site_name, block_size,file_count, creation_date,create_by, last_modification_date,last_modified_by) VALUES (:block_id,:block_name,:dataset_id, :open_for_writing,:origin_site_name, :block_size,:file_count, :creation_date,:create_by, :last_modification_date,:last_modified_by)

&{BLOCK_ID:32919159 BLOCK_NAME:/L1Accept/Tier0_REPLAY_2024-v3182029/L1SCOUT#3a331615-f7c9-4823-91c9-c7a6bae15aac DATASET_ID:14839992 OPEN_FOR_WRITING:0 ORIGIN_SITE_NAME:T0_CH_CERN_Disk BLOCK_SIZE:691037 FILE_COUNT:2 CREATION_DATE:1712162019 CREATE_BY:T0Replay LAST_MODIFICATION_DATE:1712162019 LAST_MODIFIED_BY:T0Replay} [2024-04-05 00:03:18.709809489 +0000 UTC m=+629646.977635686] bulkblocks2.go:995: insertFilesViaChunks processed 1 goroutines with ids [3084164837 3084164877], elapsed time 733.305µs [2024-04-05 00:03:18.710679899 +0000 UTC m=+629646.978506084] files.go:282: unable to validate record DBSError Code:114 Description:DBS validation error when wrong pattern is provided Function:dbs.files.Validate Message: Error: nested DBSError Code:114 Description:DBS validation error when wrong pattern is provided Function:dbs.CheckPattern Message:invalid pattern for key=logical_file_name Error: invalid parameter(s) [2024-04-05 00:03:18.711000322 +0000 UTC m=+629646.978826502] bulkblocks2.go:1002: fail to insert files chunks, trec &{IsFileValid:1 DatasetID:14839992 BlockID:32919159 CreationDate:1712275398 CreateBy:/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=cmst0/CN=658085/CN=Robot: CMS Tier0 FilesMap:{mu:{state:0 sema:0} read:{v:} dirty:map[] misses:0} NErrors:1} [2024-04-05 00:03:18.711312571 +0000 UTC m=+629646.979138766] bulkblocks2.go:743: 02adabb6b1bebd9e8845ffe10e6cf08ea44515718fd44085a338cdf181bd5ec8 unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error [2024-04-05 00:03:18.713443893 +0000 UTC m=+629646.981270092] handlers.go:89: DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:02adabb6b1bebd9e8845ffe10e6cf08ea44515718fd44085a338cdf181bd5ec8 unable to insert files, error DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error Error: nested DBSError Code:110 Description:DBS DB insert record error Function:dbs.bulkblocks.insertFilesViaChunks Message: Error: concurrency error Stacktrace:

goroutine 4083874 [running]: github.com/dmwm/dbs2go/dbs.Error({0xae4c20?, 0xc0009e2b90?}, 0x6e, {0xc00097c000, 0xe6}, {0xa1ae29, 0x2b}) /go/src/github.com/vkuznet/dbs2go/dbs/errors.go:185 +0x99 github.com/dmwm/dbs2go/dbs.(*API).InsertBulkBlocksConcurrently(0xc00035e000) /go/src/github.com/vkuznet/dbs2go/dbs/bulkblocks2.go:744 +0x16ef github.com/dmwm/dbs2go/web.DBSPostHandler({0xae81b0, 0xc00060c090}, 0xc0006ee200, {0x9feab6, 0xa}) /go/src/github.com/vkuznet/dbs2go/web/handlers.go:542 +0x1274 github.com/dmwm/dbs2go/web.BulkBlocksHandler({0xae81b0?, 0xc00060c090?}, 0x454134?) /go/src/github.com/vkuznet/dbs2go/web/handlers.go:957 +0x45 net/http.HandlerFunc.ServeHTTP(0xe10000c00043ca01?, {0xae81b0?, 0xc00060c090?}, 0x0?) /usr/local/go/src/net/http/server.go:2109 +0x2f github.com/dmwm/dbs2go/web.limitMiddleware.func1({0xae81b0?, 0xc00060c090?}, 0x0?) /go/src/github.com/vkuznet/dbs2go/web/middlewares.go:111 +0x38 net/http.HandlerFunc.ServeHTTP(0x94c100?, {0xae81b0?, 0xc00060c090?}, 0x11?) /usr/local/go/src/n



* It turns out the so created `Concurrency Error` handler, can mask not only the ORA00001 error from the database, but also validation errors coming from the `dbs` server code base.  

* I am now about to change and reconfigure the `logical_file_name` field at the lexicon and we will retry the injections.
todor-ivanov commented 3 months ago
todor-ivanov commented 3 months ago

Aand the results are more than positive:

&{BLOCK_ID:32919750 BLOCK_NAME:/L1Accept/Tier0_REPLAY_2024-v3182029/L1SCOUT#3a331615-f7c9-4823-91c9-c7a6bae15aac DATASET_ID:14839992 OPEN_FOR_WRITING:0 ORIGIN_SITE_NAME:T0_CH_CERN_Disk BLOCK_SIZE:691037 FILE_COUNT:2 CREATION_DATE:1712162019 CREATE_BY:T0Replay LAST_MODIFICATION_DATE:1712162019 LAST_MODIFIED_BY:T0Replay} [2024-04-05 16:31:39.260464202 +0000 UTC m=+61.265251713] bulkblocks2.go:995: insertFilesViaChunks processed 1 goroutines with ids [3084212117 3084212157], elapsed time 828.803µs [2024-04-05 16:31:39.267165923 +0000 UTC m=+61.271953803] files.go:288: Insert Files file_id=3084212117 lfn=/store/backfill/1/data/Tier0_REPLAY_2024/L1Accept/L1SCOUT/v3182029/000/378/788/00000/7fa128e6-a29f-4c2d-8c25-2ebb81e54774.root [2024-04-05 16:31:39.277333904 +0000 UTC m=+61.282121413] files.go:288: Insert Files file_id=3084212157 lfn=/store/backfill/1/data/Tier0_REPLAY_2024/L1Accept/L1SCOUT/v3182029/000/378/788/00000/bdebda18-d654-4731-8a9e-18dd0b47403f.root [2024-04-05 16:31:39.279887456 +0000 UTC m=+61.284674964] bulkblocks2.go:808: 02adabb6b1bebd9e8845ffe10e6cf08ea44515718fd44085a338cdf181bd5ec8 insert FileLumi list sequentially 1 records [2024-04-05 16:31:39.283216813 +0000 UTC m=+61.288004337] bulkblocks2.go:808: 02adabb6b1bebd9e8845ffe10e6cf08ea44515718fd44085a338cdf181bd5ec8 insert FileLumi list sequentially 1 records


* and also das now  shoes all file details
vkuznet commented 3 months ago

Todor, what is worry me is this line: [2024-04-05 16:31:39.279887456 +0000 UTC m=+61.284674964] bulkblocks2.go:808: 02adabb6b1bebd9e8845ffe10e6cf08ea44515718fd44085a338cdf181bd5ec8 insert FileLumi list sequentially 1 records which claims that FileLumi list was inserted sequentially. We should not degrade DBS code back to sequential processing. But yet, somehow it is the case. It may be FileLumiList is small enough, i.e. it is a last chunk, and I would appreciate if you'll check that.

todor-ivanov commented 3 months ago

Hi @vkuznet it was a test replay from T0 and we know for sure it was a short one. The whole block consist only of two fairly short files.