Open joint-song opened 1 week ago
That seems to be the case as per your tests. gohive is just using the thrift libraries, and they appear to not be thread safe.
I'd be surprised the bottleneck of the meta store would be the number of sockets it's serving. If you can think of an optimization I'll be happy to accept it.
I think Establishing/Closing connection has time and resource cost, it's unnecessary. And concurrent calls over single connection will result in unexpected behaviors. I found some solutions:
Have you investigated the possibility of http transport? It will panic when it's not binary in meta.go. It looks like the trino also did related works here.
Thank you for investigating possible solutions.
Introducing a Client pool to reuse the underlying tcp connection till it's timeout. The lifecycle of each tcp connection is guaranteed by this pool.
What would happen if there are more queries than thrift connections? Would these block until connections become available? Ideally I think the pool would wait and if there are queries waiting for too long it would temporarily grow. But I don't like the complexity of dealing with this in gohive. Also other users may prefer different pooling behaviour.
I think if there's going to be a pool somewhere it's better that it's handled at the application layer. That handles the gohive connections as pleased. This would increase the complexity of TestConcurrentQueryOnSingleConn
, but it'd keep gohive simple.
Have you investigated the possibility of http transport?
I haven't tried this, but my assumption is that it's going to work (I just didn't add the code while working on binary format). Possibly adding the tests would be the hardest part. We would need to make this setup work with kerberos, http and the metastore.
Also, why do you think http transport will support concurrency?
When I query metastore with the single connection, there will be some errors. Test code is below:
Errors like below:
But if I changed to using an exclusive connection for each query, all queries will be successful. Test code like this:
I am not certain about that does it impact the performance of hive-metastore while eastablishing a new connection once an operation?