YahooArchive / omid

Transactional Support for HBase (Mirror of https://github.com/apache/incubator-omid)
http://omid.incubator.apache.org/
Apache License 2.0
302 stars 110 forks source link

Create a Omid-like transaction layer on top of Phoenix #28

Open jtaylor-sfdc opened 11 years ago

jtaylor-sfdc commented 11 years ago

I think the combination of Omid plus the Phoenix (https://github.com/forcedotcom/phoenix) would be pretty powerful. Any interest?

We've got a related issue over on our github repo here: https://github.com/forcedotcom/phoenix/issues/209#issuecomment-18183268

Or feel free to contract me off-list: jtaylor@salesforce.com

Thanks.

fpj commented 11 years ago

Hi James,

It sounds like a good call, at least I'm interested. Do you have any thoughts already on how to do the integration? Could you provide some pointers in the Phoenix code where we would have to hook in Omid?

-Flavio

On May 21, 2013, at 2:39 AM, James Taylor notifications@github.com wrote:

I think the combination of Omid plus the Phoenix (https://github.com/forcedotcom/phoenix) would be pretty powerful. Any interest?

We've got a related issue over on our github repo here: forcedotcom/phoenix#209

Or feel free to contract me off-list: jtaylor@salesforce.com

Thanks.

— Reply to this email directly or view it on GitHub.

jtaylor-sfdc commented 11 years ago

Hi Flavio, glad to hear you're interested. There are a couple of ways to come at this that I can think of, but you guys over there would be a better judge of which way to go:

  1. Phoenix supports a way at connection time of setting the time stamp via it's CurrentSCN connection property. For an example of setting this connection property, take a look at QueryExecTest#testScan. If this property is set, then this is the time stamp that will be used for Put/Deletes as well as the being the upper bound for the time range of a scan. Since the Phoenix JDBC driver is embedded, creating a new connection is not an expensive operation.
  2. Otherwise, if this property is not set, we bind the max time range on a scan when we resolve a table reference in the FROM clause. See FromCompiler.SelectClauseVisitor#createTableRef, around line 121. The return value of the MetaDataClient#updateCache call gives us back the server-side time. This is returned from the MetaDataProtocol#getTable EndPoint coprocessor invocation which is implemented by MetaDataEndpointImpl. On the write/update side of things we make the same MetaDataClient#updateCache call to get back a time stamp from the server to use for our Puts/Deletes in MutationState#validate. So in this case, if Omid could be hooked into MetaDataEndpointImpl to get the current timestamp, that might work.
  3. For the case of UPSERT SELECT (our version of an INSERT SELECT) where you're reading and writing, we handle that in UpsertCompiler. The behavior is different, depending on whether you have auto commit on or not. This is kind of a corner case that we can talk about more about down the road, but it's kind of a hybrid on the above (1) and (2).
  4. You'd likely need to add a new EndPoint coprocessor to send row updates for a commit so that you could correctly order the transactions (does Omid do something along these lines?). These come through the MutationState#commit call now, except for the auto commit case I mentioned in (3).

Let me know if this is enough detail - I'm happy to provide more.

adenysenko commented 10 years ago

Have a look at Haeinsa as well: https://github.com/VCNC/haeinsa I'm not sure what are cons/pros over Omid.

satoshi75nakamoto commented 10 years ago

@adenysenko take a look at: https://github.com/VCNC/haeinsa/issues/4

ikatkov commented 8 years ago

In progress...