ODSKLM / salsa-beach

1 stars 2 forks source link

Should we have `HBase.withConnection`? #4

Open SvenvDam opened 4 years ago

SvenvDam commented 4 years ago

Creating an HBase connection is a fairly heavyweight operation. These connection objects also internally cache region locations and stuff which is why it's generally recommended to create just one Connection per application instance. They are thread-safe so can be shared without issues. This in contrast to Table objects which are lightweight and cannot be shared safely.

HBase.withConnection promotes frequent creation of Connections which I dont think we should do. We could either get rid of this method or do some internal caching with a lazy val.

DonDebonair commented 4 years ago

The original Salsa Beach was created for Flight720. In the Spark jobs it's hard to share stuff like connections between executors, so I think it's valid to have an easy scoped way of creating connections. As an alternative to your proposal, we could also make it clear in the documentation what the recommended usage patterns are (ie. withConnection vs createConnection

SvenvDam commented 4 years ago

I see, but I'm not sure if we should be targetting Spark with this library. There is an apache-maintained spark connector which does things like sharing connections with executors. https://hbase.apache.org/book.html#_basic_spark, https://github.com/apache/hbase-connectors/tree/master/spark

I think that should actually be the recommended way to deal with HBase if you are running spark jobs.

DonDebonair commented 4 years ago

Does Flight720 use the Spark HBase connector though?

SvenvDam commented 4 years ago

In most places we dont no so I guess we should keep this if we want F720 to be able to integrate this without large refactorings. Lets indeed just update docs to warn against creating many connections.