4paradigm / OpenMLDB

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
https://openmldb.ai
Apache License 2.0
1.58k stars 316 forks source link

Optimize concurrent access for aggregators #1559

Open nautaa opened 2 years ago

nautaa commented 2 years ago

Describe the feature you'd like

The aggregators may be accessed concurrently. Now each of the aggregators is protected by its mutex. It is costly in terms of memory usage.

We should design a more efficient thread-safe method to access the aggregators.

Some possible directions:

  1. introduce the implementation of concurrent hashmap from third-party libraries.
  2. asynchronous update for the aggregators
  3. introduce some atomic way to eliminate the lock

Additional context pre-aggregators update during Put request

src/storage/aggregator.h

dl239 commented 2 years ago

for what purpose

aceforeverd commented 2 years ago

folly has one: https://github.com/facebook/folly/blob/main/folly/concurrency/ConcurrentHashMap.h not sure if ConcurrentHashMap is ready enough

nautaa commented 2 years ago

for what purpose

for aggregator concurrent access, now we wrapper the std::unordered_map, but it is not efficient implementation. https://stackoverflow.com/questions/48987641/thread-safe-stdmap-locking-the-entire-map-and-individual-values

dl239 commented 2 years ago

how about update aggregator value by async reading binlog. Synchronous computation have more workload on Put

zhanghaohit commented 2 years ago

how about update aggregator value by async reading binlog. Synchronous computation have more workload on Put

Yes. We may change to asynchronous as the next step.