apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.11k stars 650 forks source link

Support passing a custom `TransactionalSystem` in `TDB2StorageBuilder` #2200

Open lolgab opened 9 months ago

lolgab commented 9 months ago

Version

4.10.0

Feature

I want to use Jena in a cats-effect Scala application. Cats Effect uses an asynchronous runtime that implements many concurrency constructs in a "semantic" fashion (for example, blocking on a lock is implemented by waiting to execute a callback, instead of using an actual Java lock).

Cats Effect programs are split into many consecutive callbacks which are then executed by their runtime. The runtime uses many threads that execute the next action available for execution. With Cats Effect I can implement transactions, but they don't work with Jena, since Jena assumes that your code runs sequentially on the same thread. This assumption is reflected in many places. In TransactionalBase Jena maintains a ThreadLocal of a Transaction so only one can exist per thread. In TransactionCoordinator Jena uses a ReentrantReadWriteLock which doesn't allow to lock it from a thread and unlock it from another. In normal Java code, this makes sense since it's highly unsafe to lock from one thread and unlock from another. But in Cats Effect code there are other safeguards to make sure that no race conditions happen.

I would like to customize TDB2 to allow using my own TransactionalSystem, but this is impossible at the moment since the only way to construct a TDB2StorageBuilder is via its build static methods which hardcode the TransactionalSystem to use TransactionalBase here.

If I can implement my own TransactionalSystem I think I can disable the built-in transactionality and enforce it in other ways.

Are you interested in contributing a solution yourself?

None

afs commented 9 months ago

Interesting! This style of usage wasn't in the design space so, in a touch of caution, note that the current usage patterns may go deep inside the code. For example, there isn't a per-transaction state object passed to every operation, it's implicit via ThreadLocals. This then influences any assumed code sequences.

I see no problem providing for adding a other TransactionalSystem implementations. Such implementations will need to take responsibility for correct behaviour.

The usual TransactionalSystem is connected to the transaction coordinator.

Are you planning on using any of Jena's transaction system implementation or having a separate implementation?

See also TransactionalSystem.detach / TransactionalSystem.attach.

The ReentrantReadWriteLock in TransactionCoordinator control exclusive/non-exclusive mode. "read-write" is not in terms of transaction read-write; it means "non-exclusive" and "exclusive" and happen to map to words "read" and "write" for a lock provided by the JRE.

The "non-exclusive" mode is normal mode (multiple reader and single writer (MR+SW)l transactions run non-exclusive. The "exclusive" mode is system wide exclusive access for special system operations across the whole transaction system - e.g.the parallel bulk loader locks out all transactions and directly manipulates the TDB2 datastructures.

lolgab commented 9 months ago

Thank you for your answer! You are right in saying that the usage pattern goes deep inside the code. I tried to write a custom TDBStorageBuilder receiving a TransactionalSystem and it's not enough since I had a failure in TransactionalComponentLifecycle which is deep inside of the implementation of TDB2. It would be nice if Jena stopped relying on ThreadLocals but used parameter passing to access the current Transaction, but I understand this is a big change and I'm not sure it's worth it. For my use case I can work around the problem by using Thread Pool where I assign a Thread to the whole transaction. Which works correctly.