atlassian-labs / db-replica

Automatically chooses between database connections to read-write main or read-only replica
Apache License 2.0
10 stars 12 forks source link

Expose API to controll lifecycle for db specfic operation like set #141

Open afaruga-atlassian opened 2 years ago

afaruga-atlassian commented 2 years ago

Currently, db-replica implements special logic for Postgres specific commands set. It contains a bug- the connections released back to the pool don't have cleared set operations. E.g set session timeout isn't rolled back, and as a result, it leaks between MAIN and REPLICA connections, which results in different than expected timeouts on connections.

The special logic should be reverted. The new API should allow customers to define connection lifecycle logic on db-specific operations. It has to sustain current db-time effectiveness (% of traffic moved from MAIN to REPLICA) Specific operations should pass them between connections according to whatever db-replica is doing(connection state) to achieve an effective split. Such operations should be set per dual-connection lifecycle for all connections taken from provider, which have been actually used during this lifecycle.

Workaround: Currently if you use Postgress set commands, you have to clean set operations while releasing it to the pool.

wyrzyk commented 2 years ago

The special logic should be reverted.

I'd be careful with that. In our product, the change would cause a drastic efficiency drop and could lead to an incident. In other words, customers may rely on this feature even if it's buggy and never should be shipped.

Exposing a new API is one thing, and deprecating + unshipping buggy feature is another.

I see the whole change as:

afaruga-atlassian commented 2 years ago

I totally agree that change should be properly tested, rolled out carefully, and be backward compatible.

It could be reproduced and initially, even an extra API with observability could be added to gather more data about leaked connections and set operations.