bippio / go-impala

Golang Driver for Apache Impala
MIT License
52 stars 39 forks source link

Issues when query returns lots of records/rows #5

Closed vishjosh closed 5 years ago

vishjosh commented 5 years ago

impala.go.txt

We are running the attached client program to iterate all the DBs and then iterate the tables for each DB.

We realized there are 2 issues :

@timarmstrong - I would need some suggestion from you on both the issues .

Both issues seem to be stemming due to the driver's incorrect understanding of how to use the thrift interface/services. Any help will be appreciated to point us to a more robust way of doing this without changing the current dependency on beeswax.

@intamyuto FYI

timarmstrong commented 5 years ago

I tried to reproduce this on my dev environment (master Impala, no security) and it seems to work fine for a database with many tables. https://github.com/timarmstrong/go-impala-repro

If you can reproduce this on a non-production environment with verbose logging set to 3, I'd be interested in seeing the impalad INFO logs, since those will log most of the RPCs sent from the client to the server and the responses.

timarmstrong commented 5 years ago

E.g. here's the log output I see locally impala.INFO.excerpt.txt

vishjosh commented 5 years ago

@paribhasha : Can you please try Tim's script with the server we have where we have LDAP and SSL enabled . That is the closest we can come to trying to reproduce the problem outside of the enterprise's cluster.

@timarmstrong we do not have access to the server on which we are seeing the error. We will try to reproduce it in a local set up as best as we can.

vishjosh commented 5 years ago

@paribhasha Can you please create a pull request for the changes we have done so far (also can replace the sample client program with the one you have) . Vsevolod can incorporate the changes in his fix.

shaloi commented 5 years ago

@vishjosh @timarmstrong

I was able to repro the issue with a DB with 2000 tables (created using https://github.com/timarmstrong/go-impala-repro) in a Server of version: impalad version 2.10.0-cdh5.13.0 RELEASE (build 2511805f1eaa991df1460276c7e9f19d819cd4e4). This is the quickstart.cloudera VM.

Earlier the issue was seen in Server version: impalad version 2.8.0-cdh5.11.2 RELEASE (build f89269c4b96da14a841e94bdf6d4d48821b0d658)

I bumped up the logging by setting GLOG_v=3 and have collected the impalad INFO logs. go-impala-repro-impalad.INFO.log The relevant query is around line number 1330 in this file.

Client output:

_go_impalarepro is the database with 2000 tables.

$ go run impala.go 2019/02/12 06:35:41 List of Databases [_impala_builtins bipp default go_impala_repro nyc_yellow_taxi] 2019/02/12 06:35:41 List of Tables in Database _impala_builtins: [] 2019/02/12 06:35:41 List of Tables in Database bipp: [store_info uscrimes] 2019/02/12 06:35:42 List of Tables in Database default: [] 2019/02/12 06:35:42 GetResultsMetadata failed: get_results_metadata: wrong method name 2019/02/12 06:35:42 List of Tables in Database go_impala_repro: [] 2019/02/12 06:35:42 error in querying database nyc_yellow_taxi: query: wrong method name

timarmstrong commented 5 years ago

Hrm, it's strange that it looks like the calling pattern is the same for the earlier databases and go_impala_repro. This is a wild guess, but it could somehow be related to https://issues.apache.org/jira/browse/IMPALA-7101, where get_results_metadata() on beeswax throws an exception that it doesn't declare.

timarmstrong commented 5 years ago

I tried running against the latest CDH5 without security and couldn't reproduce.

With the quickstart VM, were you running with security enabled? I'm interested if this is only reproducible with security or if it's somehow related to the version of Impala that's running.

It looks like the error is bubbling up from the go thrift client - https://github.com/apache/thrift/blob/af7ecd6a2b15efe5c6b742cf4a9ccb31bcc1f362/lib/go/thrift/client.go#L46. I'm wondering if somehow the client-server communication is getting out of whack at the thrift layer. It's hard to explain why subsequent queries fail otherwise.

vishjosh commented 5 years ago

@timarmstrong It is most likely with ldap and ssl enabled.
@paribhasha can confirm once he is online (in about 7 hours )

shaloi commented 5 years ago

yes, my tests have confirmed now that the problem arises only with security enabled. After disabling SSL and LDAP authentication, I could pass any arbitrary 32 bit integer as fetch_size to the thrift client and not hit the issue.

intamyuto commented 5 years ago

This is a transport issue. TSaslTransport (custom wrapper around thrift transport) reads big frames incorrectly. I am working on the patch.