Impact of the new feature
Reduce operational overhead of WM services.
Is your feature request related to a problem? Please describe.
Current WM architecture is based on set of distributed data-services talking to different databases and overlapping data. The new system can eliminate many components of WMAgent and replace them with central high-availability service to hold and serve WM data from a single location. This can lead to reduction of operation cost, maintenances various components, and overall improvements for WM services.
Describe the solution you'd like
I propose to adopt data (event) driven architecture with central WMPayload service. The full proposal is available in this google document. It consists of the following:
introduce WMPayload service with the following characteristics:
High-availability service (run it on k8s)
Horizontal scalable on demand of the clients
High-throughput for data IO operations
Low-latency data IO with O(ms) range
Support for JSON and NDJSON data-formats
if necessary this architecture can be complemented with event driven approach of Pub/Sub service for distributing messages among various components.
the database backend can be either document oriented database, e.g. MongoDB, or even relational database like ORACLE which supports storing and querying data in JSON data-format.
within such database backend we may introduce GraphQL for more flexibility or rely on existing QueryLanguage to support WMPayloads queries
The benefits of the new architecture can be summarized as following:
No change of underlying programming language within WM (Micro)Services, e.g. we can still use the same python code
Eliminate multiple databases, CouchDB, MongoDB, and converge on a single database backend
If MongoDB is chosen we can use its flexible QL
Horizontally scalable (high availability and throughput)
Uniform storage and APIs for managing unstructured data (JSONs)
Data streaming via NDJSON
Eliminate need to use JSON records as payload across services, instead relies on UUID
Memory reduction for ALL MS service
If we’ll switch to ((ND-JSON** the memory footprint should not exceed much the size of a single processed JSON record
To speed up service an asynchronous pattern should be applied as records can be processed in parallel
Common QL for backend database, e.g. use MongoDB with Mongo QL (JSON)
Data accessibility via APIs rather direct database CouchDB views
Allows to re-design WMStats easily, i.e. separate data presentation from database
Separation of service from a database backend
We can provide RESTful service and choose any document-oriented database with it (CouchDB, MongoDB, ElasticSearch, etc.)
To achieve additional speed up a cache layer can be added between service and database, e.g. Reddis
Describe alternatives you've considered
Many iterations of existing architectures.
Additional context
There is a very simple but fully function prototype WMPayload service which satisfies to desired functionality and requirements. The initial prototype shows the following performance using JSON data-format:
operation
document
req/sec
bytes/operation
memory allocations
write single doc
auto-gen
0.5ms
12KB
197
write single doc
ReqMgr2
0.8ms
60KB
666
read single doc
auto-gen
0.2ms
12KB
124
read single doc
ReqMgr2
0.5ms
38KB
201
read all docs
ReqMgr2
75ms
102MB
238
Tests were performed under macOS (Apple M2 8 core) and used either auto-generated JSON docs or documents taken from ReqMgr2 service. In total there were 1500 documents in MongoDB indexes by uuid.
Impact of the new feature Reduce operational overhead of WM services.
Is your feature request related to a problem? Please describe. Current WM architecture is based on set of distributed data-services talking to different databases and overlapping data. The new system can eliminate many components of WMAgent and replace them with central high-availability service to hold and serve WM data from a single location. This can lead to reduction of operation cost, maintenances various components, and overall improvements for WM services.
Describe the solution you'd like I propose to adopt data (event) driven architecture with central WMPayload service. The full proposal is available in this google document. It consists of the following:
The benefits of the new architecture can be summarized as following:
Describe alternatives you've considered Many iterations of existing architectures.
Additional context There is a very simple but fully function prototype WMPayload service which satisfies to desired functionality and requirements. The initial prototype shows the following performance using JSON data-format:
Tests were performed under macOS (Apple M2 8 core) and used either auto-generated JSON docs or documents taken from ReqMgr2 service. In total there were 1500 documents in MongoDB indexes by uuid.