two phase commit support

wrobell commented 7 years ago

It would be great if asyncpg supported two-phase commit

vitaly-burovoy commented 7 years ago

Since the asyncpg does not support Python's DB API 2.0 (for instance, see placeholders), there is no reason to support its functions (similar to syntax sugar). Psycopg2 internally sends it as a plain DDL and nothing prevents you from the same behavior.

The goal of 2PC is to allow to commit or to rollback transactions in distributed systems if a connection is reset or if your application or even a database fails. So there is no reason to write in a lo-level library a wrapper similar to regular transactions because there are many high-level nuances in managing 2PCs during instance's life cycle (e.g. attempts to COMMIT/ROLLBACK PREPARED 2PC transactions from a local WAL-log at the start of an application -- see the "Notes" part in the documentation[1]).

[1] https://www.postgresql.org/docs/current/static/sql-commit-prepared.html

wrobell commented 7 years ago

IMHO, it is matter of allowing transaction context manager and Transaction class to accept custom id plus some if statements in the code of the class (see also parts of the code related to the save point feature aka nested transactions).

vitaly-burovoy commented 7 years ago

Could you, please, provide a snippet with 2PC and the Transaction class? Just to clarify — which DDLs a context manager should send to the Postgres in magic methods? (I bet you know how 2PC works and you have a transaction manager now to be interacted with).

wrobell commented 7 years ago

Please consider the following

async with conn.transaction(id='tx1'):  # or prepared='tx1'?
    conn.execute(...)
# on success: prepare transaction 'tx1'
# on error: rollback
# NOTE: commit is not executed

Also allow

await conn.commit(id='tx1')  # executes: commit prepared 'tx1'
await conn.rollback(id='tx1')  # executes: rollback prepared 'tx1'

vitaly-burovoy commented 7 years ago

Yes, I got the same decision (as a nearly preferred way) while I was writing a big answer several hours ago.

It still seems strange for me because CMs are designed to free (unlock) resources at the exit from a "scope", but it is not true with your proposal.

In your case exit from a CM does not free a resource if a transaction was succeed. Moreover, an exception ConnectionResetError before sending PREPARE TRANSACTION and after it has different meanings: in the first case it is undoubtedly cancels the transaction, in the second case we have to check its state at the server's side.

But outside of the CM your code has no way to check which case it is. The CM still should be wrapped by a try/except block to catch possible exceptions and check which ones have been (possibly) prepared for 2PC and which have been already rolled back.

IMHO it makes a CM be hard to use in real cases "as is" (plus your proposal) because it is neither a manual nor an automatic work with 2PC.

I think the best way is to use a different API to deal with it correctly. Something like that (very first thoughts):

>>> async with conn.Transaction2PC(xact_id='tx1', manager=...) as mgr2pc:  # stage (1)
>>>     async with mgr2pc.xact():  # stage (2)
>>>         conn.execute(...)
>>>     # stage (3)
>>> # stage (4)

Notes:

(1) checks "ready for query" state, ability to use 2PCs (fetch the max_prepare_transaction setting);

(2) knows it is a 2PC transaction and saves the mgr2pc object

(3) a. on exception: set "rolling back", send rollback (ignore exceptions), set "rolled back" (to prevent changing exception to a ConnectionResetError);

(3) b.1. calls via mgr2pc a callback (like "we are ready for the 1st phase" - may be it knows the distributed transaction has already failed) (3) b.2. sends the PREPARE TRANSACTIONS to a connection (3) b.3. if an exception is ConnectionResetError then set "prepared unknown" state else "rolled back"

(4) a. sends a result of the 1st phase (OK/rolled back/unknown) a transaction manager; (4) b. waits for a status of a global transaction from it; (4) c. sends COMMIT PREPARED/ROLLBACK PREPARED to a connection; (4) d. sends result of COMMIT/ROLLBACK PREPARED to a transaction manager;

Many corner cases are not mentioned above.

In addition there should be an API for a class which instance is passed as a parameter for manager=.

wrobell commented 7 years ago

I would really keep it simple.

For example if max_prepared_transactions is set to zero, the server will raise an error - no need to check anything.
Do not implement transaction manager functionality, just enable the API to support the prepared transactions.

vitaly-burovoy commented 7 years ago

I would really keep it simple.

I would too, but it leads to errors in a wrong time.

no need to check anything.

Check is cheap and can be done once per connection since changing it requires Postgres restart. In the other case people can get "prepared transactions are disabled" after long query is run. Later the value can be passed to a transaction manager to be used in app-level locking at the 3rd stage (to avoid "maximum number of prepared transactions reached")

Do not implement transaction manager functionality

Oops. I did not mean a real transaction manager. Just a class which has an API to be able to interact with an external one. That class can be simple and just implement a local "WAL" for xact_ids or be complex enough. But without it a single CM should not be used because after exiting from it a transaction still keeps locks. It breaks the idea itself of context managers.

1st1 commented 7 years ago

This looks like something that can be implemented outside of asyncpg, no?

wrobell commented 7 years ago

It would be nice if asyncpg supported basic API, see https://github.com/MagicStack/asyncpg/issues/86#issuecomment-286390961. The rest, i.e. transaction id generation, communication between entities performing transactions or logic of committing transactions is probably out of scope.

1st1 commented 7 years ago

@wrobell We try to keep asyncpg as simple as possible. This feature doesn't seem like something that is going to be used by a significant number of asyncpg users. Moreover, transaction.py is just 200 lines long and is built with only public API of the Connection object, meaning that it's possible to implement two-phase commit on top of asyncpg.

You're the first one who's requesting this feature, so let's keep this issue open and see if more people are interested in it.

MagicStack / asyncpg

two phase commit support #86