arangodb / python-arango

The official ArangoDB Python driver.
https://docs.python-arango.com
MIT License
446 stars 74 forks source link

feature request: Continue an existing transaction #329

Closed Moortiii closed 8 months ago

Moortiii commented 8 months ago

I've come across a case where a transaction needs to be shared across multiple systems. If we wrap the REST API we can easily achieve this by setting the x-arango-trx-id header. However, we would like to be able receive transaction IDs on both ends and continue the transaction seamlessly using the python-arango interface, instead of crudely performing raw queries against /_cursor.

I've come up with the following hack, which does work, but given that _executor is private, and _executor.id specifically doesn't have a setter, I'm guessing there may be a reason it's discouraged:

from arango.database import TransactionDatabase
from arango.client import ArangoClient

def continue_transaction(db: StandardDatabase, transaction_id):
    trx = TransactionDatabase(connection=deepcopy(db.conn))
    trx._executor._id = transaction_id
    return trx

db = ArangoClient(...).db(...)
trx = continue_transaction(db=db, transaction_id="1234")
trx.collection("vertex").insert({"_key": "test"})
trx.commit_transaction() # Alternatively, don't commit here and let the client who provided the transaction commit it themselves.

Would it make sense to support something like this directly? It seems to me like a reasonable use-case. If so, I'm happy to take a stab at developing a PR for this myself.

apetenchea commented 8 months ago

Hi @Moortiii,

I understand your proposal, and I think it is quite sensible. Updating the same transaction concurrently can cause some uncertainty due to timing issues, but when done carefully, I can imagine some valid use-cases.

As you pointed out, the _executor.id is indeed private. While adding a setter would be the easy way out of this, it would potentially allow users to write code like this:

trx = db.begin_transaction()
col1 = trx.collection("col1")
trx._executor._id = another_transaction
col2 = trx.collection("col2")

Not only the transaction ID can easily get lost, thus preventing one from ever accessing the initial transaction again, but the problem can be easily overlooked, as it is hidden in just one line of code. Frankly, I believe even the x-arango-trx-id setting trick is way better - it may look weird, but it's "loud and clear", there will be no problem figuring out what (and why) you wrote it there.

Following up on what I would consider a reasonable solution

Testing Introduce a test case test_transaction.py, something simple, just to check that we're able to use both the initial transaction and the "continued" object.

def test_transaction_fetch(db, col, docs):
    txn_db = db.begin_transaction(write=col.name)
    txn_col = txn_db.collection(col.name)
    txn_db2 = db.fetch_transaction(txn_db.transaction_id)
    # insert some documents using both txn's
    # ...

Docs A small edit in transaction.rst would be great to showcase how fetch_transaction is supposed to be used.

I'm ready to implement the above. Or, if you want to give it a go, I'm perfectly fine with that, but don't feel pressured, I'm just mentioning since you offered. Let me know how you want to proceed.

Moortiii commented 8 months ago

These seem like sensible changes that should be straight forward enough to implement. I'll give it a shot later today and report back. Thanks!

I agree with your comment about continue which could imply the ability to "pause" a transaction. I hadn't thought about it that way, but you're probably right that it would cause some confusion, especially for new users of ArangoDB.

Moortiii commented 8 months ago

I've opened a PR @apetenchea.

I did consider something like this as well:

request = Request(
    method="get",
    endpoint=f"/_api/transaction/{transaction_id}",
)
resp = self._conn.send_request(request)

if not resp.is_success:
    raise TransactionInitError(resp, request)

result = resp.body["result"]

if result["status"] != "running":
    raise TransactionInitError(resp, request)

self._id = transaction_id

My intention was to prevent a user from 'continuing' a transaction that is already committed or aborted, which would be mostly pointless. However, since the response from the API when fetching the status is 200 OK, raising a TransactionInitError and feeding it the response produced results that would be confusing to the end user. I also realized that perhaps checking the status of a transaction in an external system (that may be committed already) could be useful in some niche cases.

As a sidenote, I noticed that the docs on contributing that are present in the sphinx documentation appears to be outdated. I had to follow the contribution guidelines directly in the repository to get anywhere.