forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
558 stars 227 forks source link

Faster binary search for getColumnLatest #664

Closed lhofhansl closed 10 years ago

lhofhansl commented 10 years ago

Ran all Phoenix tests. (TestPhoenixSink hangs, but that is not due to this).

Please have a look. This only helps performance if we do not have to instanceof checks, etc. Both JDK7 and JDK6 only pas the search term to the right side of the comparator.

With this getColumnLatest no longer shows up in the profiler and I see a 20-30% performance improvement even for small KVs.

Removed the older methods, they weren't used anywhere else. If we need to keep these for backward compatibility we, could add them back.

jtaylor-sfdc commented 10 years ago

Awesome, @lhofhansl! That's a great trick - didn't know about that one. We don't need to keep the old methods.

jtaylor-sfdc commented 10 years ago

@lhofhansl - do you think it's worth pursuing a Map-based implementation of this?

lhofhansl commented 10 years ago

That's hard to say. The binary search as we have it now does not need to copy anything, it just moves some references. It makes use of the fact that the array is already sorted.

The problem would be what to use as key. For a HashMap, we'd probably need to concatenate the family and qualifier into a single byte[] array, wrap that into hbase.util.HashedBytes, not sure that'll be faster. Or do a two level hash. Would need to try.

If we wanted to use a TreeMap we'd have O(log(n)), same as the binary search.

TL;DR: I doubt we'd see an improvement.