DBS R&D for large tables

vkuznet commented 2 years ago

With growths of DBS data we need to perform R&D to address large tables

store data in different format, e.g. JSON rather single table
decouple data into another DB, e.g. put RunLumis into different DB
meta-data service for meta-data
table partions
dbs proxy server to perform concurrent requests of query from different period of time

vkuznet commented 2 years ago

Here is a brief plan of R&D activities we need to perform:

Create new abstract NoSQL db layer
Try first MongoDB, then ElasticSearch
Copy dbs file_lumis table to NoSQL DB
Perform benchmark for queries with NoSQL DB, MongoDB vs ElasticSearch
Add API to insert, update, delete, and search meta-data Integrate NoSQL db layer with the rest of dbs APIs, eg
- during insert we need to insert first into NoSQL, then fetch back results and if successful insert into DBS DB.
- during search we need concurrently search in DBS DB and NoSQL db
- Perform benchmark for insert and search

d-ylee commented 2 years ago

As per our discussion, the reasoning for doing this is because the HTTP front end API has a 5 minute timeout. Injectino of FileLumis is limited to 2-3M records per block before timeout. Fetching also takes time with an increased amount of data.

We need to first evaluate using both MongoDB and ElasticSearch. This would first require fetching FileLumis from current deployments and do an injection.

SQL For reference:

amaltaro commented 2 years ago

@d-ylee @vkuznet based on this information above, should we try to limit block sizes - in terms of number of lumis - to 1M lumis at top? Maybe we even cap it to .5M lumis per block? Once we decide on the threshold, we should feed this back to this WMCore GH issue: https://github.com/dmwm/WMCore/issues/10264

vkuznet commented 2 years ago

It will certainly be helpful to put a limit on number of lumis since so far there is no limit and as such there is a potential to go above the limit on FEs. Based on initial benchmark of time taking by bulkblocks injsertion API the it can stays within 5 min if number of lumis not exceed few millions, e.g. 2-3. Therefore, a limit of 1M is good to have in place. To improve performance it is also better to limit it further to 0.5M but I do not know if it will have any side effect on DM side.

vkuznet commented 2 years ago

In addition to reasoning @d-ylee mentioned. This R&D will explore a possibility to add more unstructured meta-data to DBS information. Recently, we listen to I. Mandrichenko talk MetaCat - meta-data catalog for Rucio-based data management system where he argued Run conditions, File provenance meta-data can be stored as non-structural data into NoSQL DB which can provide better performance for queries than structured DBS information.

dmwm / dbs2go

DBS R&D for large tables #84