apavlo / h-store

H-Store Distributed Main Memory OLTP Database System
https://hstore.cs.brown.edu
GNU General Public License v3.0
567 stars 177 forks source link

Supporting Nested Transactions in S-Store #177

Open apavlo opened 10 years ago

apavlo commented 10 years ago

The following is a proposal on how to add support to allow a stored procedure invoke another stored procedure at runtime. The basic setup is that there will be a parent distributed txn running at one partition that holds the locks for other partitions. Then instead of sending a query request to the remote partitions, we want to send a request to execute a "child" stored procedure at the remote partitions. We assume that each child stored procedures invocation will execute as a single-partition txn.

The basic mechanism that we're going to use to make this work is the PartitionExecutor's support for speculative txns, except that the child txns will not have a new txn id.

  1. The first step is to extend the VoltProcedure API to allow a stored procedure to queue other stored procedures. This should mimic the the VoltClient API where you just pass in a string with the name of the procedure that you want and an array of input parameters. You will then need to add a voltExecProcs() method to VoltProcedure that will block until all of the child txns return their results. I would make the assumption that the child txns are all going to return a single VoltTable, so therefore your new voltExecProcs() should return a VoltTable[].
  2. We then need to be able to package the txn requests in a ProtocolBuffer message and send it to their respective partitions. There are a couple of things that need to happen here. First, you need to use TransactionInitializer.calculateBasePartition() to figure out where each request should go. You can use a bogus client_handle parameter; it's not used for anything important. You will then need to extend the TransactionInitRequest message type in hstoreservice.proto to include a new parameter that specifies that you are sending a request for a nested txn.
  3. You will then want to send this request through the HStoreCoordinator using a PartitionCountingCallback. This will allow you to block the thread until all of the child txns return their TransactionInitResponse to the parent txn.
  4. When the special TransactionInitRequest with the child txn flag arrives at the remote partitions, it will get passed to TransactionInitHandler.remoteHandler(). Note that this happens even if the remote partition is at the same HStoreSite as the parent txn's base partition. Since we are re-using the same txnId as the parent, you need to make sure that you don't hit the assert in that method that complains about getting an init request with an existing txnId. You also need to make sure that you don't invoke HStoreSite.transactionInit() with the request, since that will cause the txn to get queued. You instead want to send it directly to the target PartitionExecutor using queueStartTransaction(). But you will have to create a temporary LocalTransaction handle for this txn in order to stuff in the proper Procedure, ProcParameter, and RpcCallback for the child txn invocation. otherwise, if we use the parent txn's AbstractTransaction handle, it will have invocation information for the parent procedure.
  5. Now this is where things get tricky. In PartitionExecutor.run(), we poll our work queue and look for something to do. When specexec is enabled, the PartitionExecutor will be able to process new messages that are added to its work queue. But now the problem is that it's going to get a txn invocation request with the same id as the current dtxn at that partition. There may be a bunch of asserts that perform sanity checks as you execute. We will probably have to take care of them one-by-one. You should also mark the txn as speculative so that the PartitionExecutor does not try to commit it immediately when it finishes.
  6. Now when the txn finishes, you need to be able to invoke the RpcCallback to send back a TransactionInitResponse to the parent txn. Packaging up the message and getting it back to right location will happen automatically when you invoke RpcCallback.run()
apavlo commented 10 years ago

Note: Abusing the TransactionInitRequest as I describe here may cause other problems. The other option is to create special queries that actually invoke procedures. I think this more of a hack, but it is a second option if this one doesn't work out.