I "discovered" some issues when implementing the happybase functionality on top of the Bigtable API. (I put discovered in quotes, because some of the issues may just be that I don't grok how to do the same thing with the Bigtable API).
These were mostly discovered because I wrote a system test for happybase that could work both with HBase and with the Bigtable backend. It can be switched from one to another by changing the USING_HBASE boolean.
Many other differences have been enumerated in the documentation for our custom Bigtable happybase package.
Issues / Differences
When committing a batch of mutations, the happybase method Batch.send() uses Thrift/HBase's mutateRows / mutateRowsTs method to send all mutations at once. With the Bigtable API, this is not possible, we have to commit row-by-row. (This comes up in the system test as well.)
Bigtable Garbage Collection is not as immediate as HBase. In HBase, a column with one max_version immediately evicts the old value when a new one is added. Similarly, with a TTL of 3 seconds, after sleeping for 3.5 seconds, the value has been evicted. Neither of these occur (at least consistently in Bigtable). (I don't really see this as a problem, but users from HBase may have different expectations)
A row scan with sorted_columns is not possible in Bigtable.
The Bigtable Mutation.DeleteFromRow mutation does not support timestamps (also). Even attempting to send one conditionally (via CheckAndMutateRowRequest) deletes the entire row.
Bigtable can't use a timestamp with column families since Mutation.DeleteFromFamily does not include a timestamp range.
Differences that are Upgrades
Writes to HBase (via Thrift) with a timestamp just drop the timestamp whereas the Bigtable API respects them
The Thrift API fails to retrieve the TTL information from a column family while the Bigtable API succeeds in returning this information. (We have to work-around this in a few system tests.)
When Thrift API does a row read with columns cf1 and cf1:qual1 (in that order) only the results from cf1:qual1are returned (even though they are a subset of all the columns in the column family cf1). If the columns are given in the opposite order (cf1:qual1 then cf1) the correct results are returned. In Cloud Bigtable, it works as expected in either order. (We use a union filter, one which has only family_name_regex_filter='cf1' and another which has that combined with column_qualifier_regex_filter='qual1'.) (This happen for a single row read and multiple rows.)
This method retrieves the current value of a counter column. If the counter column does not exist, this function initialises it to 0
Neither Good/Bad
HBase reads (via Table.row, Table.rows, Table.cells, Table.scan) all use exclusive end timestamps, which makes the behavior of a Bigtable TimestampRange. On the other hand, HBase deletes use inclusive end timestamps, while Bigtable deletes are still using a TimestampRange (only for deleting specific columns those, as column family or row deletes can't send a timestamp range, as referenced above). We address this just by incrementing the passed in timestamp by 1 millisecond (which is the lowest allowed granularity).
I "discovered" some issues when implementing the
happybase
functionality on top of the Bigtable API. (I put discovered in quotes, because some of the issues may just be that I don't grok how to do the same thing with the Bigtable API).These were mostly discovered because I wrote a system test for
happybase
that could work both with HBase and with the Bigtable backend. It can be switched from one to another by changing theUSING_HBASE
boolean.Many other differences have been enumerated in the documentation for our custom Bigtable
happybase
package.Issues / Differences
happybase
methodBatch.send()
uses Thrift/HBase'smutateRows
/mutateRowsTs
method to send all mutations at once. With the Bigtable API, this is not possible, we have to commit row-by-row. (This comes up in the system test as well.)max_version
immediately evicts the old value when a new one is added. Similarly, with a TTL of 3 seconds, after sleeping for 3.5 seconds, the value has been evicted. Neither of these occur (at least consistently in Bigtable). (I don't really see this as a problem, but users from HBase may have different expectations)sorted_columns
is not possible in Bigtable.KeyOnlyFilter
)Mutation.DeleteFromRow
mutation does not support timestamps (also). Even attempting to send one conditionally (viaCheckAndMutateRowRequest
) deletes the entire row.Mutation.DeleteFromFamily
does not include a timestamp range.Differences that are Upgrades
cf1
andcf1:qual1
(in that order) only the results fromcf1:qual1
are returned (even though they are a subset of all the columns in the column familycf1
). If the columns are given in the opposite order (cf1:qual1
thencf1
) the correct results are returned. In Cloud Bigtable, it works as expected in either order. (We use a union filter, one which has onlyfamily_name_regex_filter='cf1'
and another which has that combined withcolumn_qualifier_regex_filter='qual1'
.) (This happen for a single row read and multiple rows.)HBase
counter_get
doesn't actually populate the data even though the docstring says:Table.row
,Table.rows
,Table.cells
,Table.scan
) all use exclusive end timestamps, which makes the behavior of a BigtableTimestampRange
. On the other hand, HBase deletes use inclusive end timestamps, while Bigtable deletes are still using aTimestampRange
(only for deleting specific columns those, as column family or row deletes can't send a timestamp range, as referenced above). We address this just by incrementing the passed in timestamp by 1 millisecond (which is the lowest allowed granularity).