Closed vishjosh closed 5 years ago
I tried to reproduce this on my dev environment (master Impala, no security) and it seems to work fine for a database with many tables. https://github.com/timarmstrong/go-impala-repro
If you can reproduce this on a non-production environment with verbose logging set to 3, I'd be interested in seeing the impalad INFO logs, since those will log most of the RPCs sent from the client to the server and the responses.
E.g. here's the log output I see locally impala.INFO.excerpt.txt
@paribhasha : Can you please try Tim's script with the server we have where we have LDAP and SSL enabled . That is the closest we can come to trying to reproduce the problem outside of the enterprise's cluster.
@timarmstrong we do not have access to the server on which we are seeing the error. We will try to reproduce it in a local set up as best as we can.
@paribhasha Can you please create a pull request for the changes we have done so far (also can replace the sample client program with the one you have) . Vsevolod can incorporate the changes in his fix.
@vishjosh @timarmstrong
I was able to repro the issue with a DB with 2000 tables (created using https://github.com/timarmstrong/go-impala-repro) in a Server of version: impalad version 2.10.0-cdh5.13.0 RELEASE (build 2511805f1eaa991df1460276c7e9f19d819cd4e4). This is the quickstart.cloudera VM.
Earlier the issue was seen in Server version: impalad version 2.8.0-cdh5.11.2 RELEASE (build f89269c4b96da14a841e94bdf6d4d48821b0d658)
I bumped up the logging by setting GLOG_v=3 and have collected the impalad INFO logs. go-impala-repro-impalad.INFO.log The relevant query is around line number 1330 in this file.
Client output:
_go_impalarepro is the database with 2000 tables.
$ go run impala.go 2019/02/12 06:35:41 List of Databases [_impala_builtins bipp default go_impala_repro nyc_yellow_taxi] 2019/02/12 06:35:41 List of Tables in Database _impala_builtins: [] 2019/02/12 06:35:41 List of Tables in Database bipp: [store_info uscrimes] 2019/02/12 06:35:42 List of Tables in Database default: [] 2019/02/12 06:35:42 GetResultsMetadata failed: get_results_metadata: wrong method name 2019/02/12 06:35:42 List of Tables in Database go_impala_repro: [] 2019/02/12 06:35:42 error in querying database nyc_yellow_taxi: query: wrong method name
Hrm, it's strange that it looks like the calling pattern is the same for the earlier databases and go_impala_repro. This is a wild guess, but it could somehow be related to https://issues.apache.org/jira/browse/IMPALA-7101, where get_results_metadata() on beeswax throws an exception that it doesn't declare.
I tried running against the latest CDH5 without security and couldn't reproduce.
With the quickstart VM, were you running with security enabled? I'm interested if this is only reproducible with security or if it's somehow related to the version of Impala that's running.
It looks like the error is bubbling up from the go thrift client - https://github.com/apache/thrift/blob/af7ecd6a2b15efe5c6b742cf4a9ccb31bcc1f362/lib/go/thrift/client.go#L46. I'm wondering if somehow the client-server communication is getting out of whack at the thrift layer. It's hard to explain why subsequent queries fail otherwise.
@timarmstrong It is most likely with ldap and ssl enabled.
@paribhasha can confirm once he is online (in about 7 hours )
yes, my tests have confirmed now that the problem arises only with security enabled. After disabling SSL and LDAP authentication, I could pass any arbitrary 32 bit integer as fetch_size to the thrift client and not hit the issue.
This is a transport issue. TSaslTransport (custom wrapper around thrift transport) reads big frames incorrectly. I am working on the patch.
impala.go.txt
We are running the attached client program to iterate all the DBs and then iterate the tables for each DB.
We realized there are 2 issues :
Issue 1: Some DBs had more than 1K tables .
Issue 2 , Interestingly after the Next() call errors out, it just returns false and the client code moves on to the next query . However all subsequent queries on that connection fail with the following error . This is extremely problematic, ideally we should be able to reset the state and start a new query even if earlier queries error out.
@timarmstrong - I would need some suggestion from you on both the issues .
Both issues seem to be stemming due to the driver's incorrect understanding of how to use the thrift interface/services. Any help will be appreciated to point us to a more robust way of doing this without changing the current dependency on beeswax.
@intamyuto FYI