IBM / data-broker

The Data Broker (DBR) is a distributed, in-memory container of key-value stores enabling applications in a workflow to exchange data through one or more shared namespaces. Thanks to a small set of primitives, applications in a workflow deployed in a (possibly) shared nothing distributed cluster, can easily share and exchange data and messages with a minimum effort. In- spired by the Linda coordination and communication model, the Data Broker provides a unified shared namespace to applications, which is independent from applications’ programming and communication model.
Apache License 2.0
19 stars 23 forks source link

READONLY mode for Redis backend? #129

Open jmoon1506 opened 4 years ago

jmoon1506 commented 4 years ago

In the file "backend/redis/protocol.c" at line 104, it seems like Databroker calls the LINDEX command to read from a Redis server.

Is there any way to call the READONLY command before LINDEX, during the same transaction? This would let you read directly from a slave node in a cluster configuration.

Currently, Databroker closes the connection after each transaction is finished, so there's no way to modify the connection without changing these hard-coded commands.

lasch commented 4 years ago

Thanks for your suggestions. I'll try to explain the rationale behind the current implementation to hopefully address your comments/questions:

The library is only connecting to master nodes at the moment. Any connections to replicas will be dropped and only re-created for failover. Using redundant connections to replicas will require a substantial rework of the connection management. Some projects target large-scale installations where the number of connections to the master nodes can become a limiting factor. Adding more connections to replicas would further increase the number of active connections. So a connection management system would have to create and destroy connections based on application behavior - which is a beast in and of itself.

That said, adding the READONLY command before LINDEX could potentially improve performance by allowing reads from replicas. However, it requires a proper solution to the connection scaling issue first. And there's a second aspect that would have to be addressed: allowing reads from replicas assumes that master and replica are in sync at all times. With mostly static data, this would be fine. However, with dynamic data, this would force the write-side settings to require the replication to complete before returning the RPUSH to finish. This would cause a serious slowdown of the writes. Now again it depends on the requirements of the application and it will be a nice improvement to the flexibility of the Data Broker library when the different behavior could be set up through the tuple-persistence-flags. E.g. DBR_PERST_*_FT (requesting data to be stored fault tolerant) would then require the replica to be synced before writes return. This was the plan behind those flags and it would be great to get it implemented. Unfortunately, I personally don't see available time to get that done in a foreseeable future.

Once the connection management is improved, the READONLY and READWRITE could be added to the commands either in a hardcoded way or by adding more parameters to the base-command strings.

Regarding your comment about closing connections after each transaction: The library is only closing connections when the application detaches from a namespace and only if this was the last attached namespace. Recreating connections for each transaction would result in terrible performance otherwise.

I hope I addressed the things you were looking for. I'm happy to discuss suggestions. I'll also take another look at the replica connections to determine if there's a shorter path to replica access so that READONLY can be put to use and will report here if anything comes up.