beltran / gohive

Go driver for Apache Hive and the Hive Metastore
MIT License
230 stars 91 forks source link

Should I connect to metastore once an operation? Is it necessary to use connection pool? #242

Open joint-song opened 1 week ago

joint-song commented 1 week ago

When I query metastore with the single connection, there will be some errors. Test code is below:

func TestConcurrentQueryOnSingleConn(t *testing.T) {
    options := gohive.NewMetastoreConnectConfiguration()
    conn, err := gohive.ConnectToMetastore("SOME_HOST", 9083, "NOSASL", options)
    assert.NoError(t, err)

    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            _, err := conn.Client.GetAllDatabases()
            assert.NoError(t, err)
        }()
    }
    wg.Wait()
}

Errors like below:

=== RUN   TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            short write
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            short write
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            short write
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            get_all_databases failed: wrong method name
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            EOF
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            EOF
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            EOF
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            EOF
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            EOF
            Test:           TestConcurrentQueryOnSingleConn
    /hiveclient/internal/metastore_test.go:56:
            Error Trace:    /hiveclient/internal/metastore_test.go:56
                                        /usr/local/go/src/runtime/asm_arm64.s:1172
            Error:          Received unexpected error:
                            EOF
            Test:           TestConcurrentQueryOnSingleConn
--- FAIL: TestConcurrentQueryOnSingleConn (0.16s)

But if I changed to using an exclusive connection for each query, all queries will be successful. Test code like this:

func TestConcurrentQueryOnRespectiveConn(t *testing.T) {
    options := gohive.NewMetastoreConnectConfiguration()

    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            conn, err := gohive.ConnectToMetastore("SOME_HOST", 9083, "NOSASL", options)
            assert.NoError(t, err)
            _, err = conn.Client.GetAllDatabases()
            assert.NoError(t, err)
        }()
    }
    wg.Wait()
}

I am not certain about that does it impact the performance of hive-metastore while eastablishing a new connection once an operation?

beltran commented 1 week ago

That seems to be the case as per your tests. gohive is just using the thrift libraries, and they appear to not be thread safe.

I'd be surprised the bottleneck of the meta store would be the number of sockets it's serving. If you can think of an optimization I'll be happy to accept it.

joint-song commented 1 week ago

I think Establishing/Closing connection has time and resource cost, it's unnecessary. And concurrent calls over single connection will result in unexpected behaviors. I found some solutions:

Have you investigated the possibility of http transport? It will panic when it's not binary in meta.go. It looks like the trino also did related works here.

beltran commented 1 week ago

Thank you for investigating possible solutions.

Introducing a Client pool to reuse the underlying tcp connection till it's timeout. The lifecycle of each tcp connection is guaranteed by this pool.

What would happen if there are more queries than thrift connections? Would these block until connections become available? Ideally I think the pool would wait and if there are queries waiting for too long it would temporarily grow. But I don't like the complexity of dealing with this in gohive. Also other users may prefer different pooling behaviour.

I think if there's going to be a pool somewhere it's better that it's handled at the application layer. That handles the gohive connections as pleased. This would increase the complexity of TestConcurrentQueryOnSingleConn, but it'd keep gohive simple.

Have you investigated the possibility of http transport?

I haven't tried this, but my assumption is that it's going to work (I just didn't add the code while working on binary format). Possibly adding the tests would be the hardest part. We would need to make this setup work with kerberos, http and the metastore.

beltran commented 1 week ago

Also, why do you think http transport will support concurrency?