Huawei-Spark / Spark-SQL-on-HBase

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Apache License 2.0
321 stars 164 forks source link

[WIP] Support for update/delete #6

Open scwf opened 9 years ago

scwf commented 9 years ago

todos:

yzhou2001 commented 9 years ago

Thanks for the PR. It's a very nice implementation. A few comments:

1) For DELETE, it seems that there is no guarantee that all row keys are returned from child query plan. You need to add all row key columns in the child's projection if not already present to guarantee it. 2) closeHTable should not be called inside HBaseRelation.deleteFromHBase: the HTable handle is a cached entity and managed by HBaseRelation. 3) HBaseRelation.insert now assumes "overwrite" to be always false, which actually should be the opposite. This is a legacy issue and you could leave it as is and defer to a future fix, if you wish. 4) We need somewhat comprehensive test coverage of these two important features. 5) Please also update the design doc about the two features, including the "revision history" to add yourself as a updater.

yzhou2001 commented 9 years ago

Regarding the call to closeHTable, I think you can leave it as is now to achieve a sure flush. Thanks.

scwf commented 9 years ago

i have tested this patch in my own cluster, and the functionality is ok.