Closed komamitsu closed 10 years ago
ping
sorry, I totally forgot! What are these used for? I've never seen either option.
Thanks for the reply.
What are these used for? I've never seen either option.
When using impala-shell.sh, "hadoop_user" and "configuration" are passed to Impala implicitly.
In Impala log:
I1031 10:13:02.235416 6460 impala-beeswax-server.cc:133] query(): query=select count(1) from sample
I1031 10:13:02.235528 6460 impala-beeswax-server.cc:447] query: Query {
01: query (string) = "select count(1) from sample",
03: configuration (list) = list<string>[0] {
},
04: hadoop_user (string) = "cloudera",
}
Regarding "hadoop_user", it's used as an username in metastore.
For "configuration", it's used as a query option like this:
Impala-shell:
[localhost.localdomain:21000] > set mem_limit=700000000000000000;
MEM_LIMIT set to 700000000000000000
[localhost.localdomain:21000] > select * from sample;
:
log:
I1031 10:26:10.191804 6461 impala-beeswax-server.cc:133] query(): query=select * from sample
I1031 10:26:10.191920 6461 impala-beeswax-server.cc:447] query: Query {
01: query (string) = "select * from sample",
03: configuration (list) = list<string>[1] {
[0] = "MEM_LIMIT=700000000000000000",
},
04: hadoop_user (string) = "cloudera",
}
I1031 10:26:10.191988 6461 impala-beeswax-server.cc:460] TClientRequest.queryOptions: TQueryOptions {
01: abort_on_error (bool) = false,
02: max_errors (i32) = 0,
03: disable_codegen (bool) = false,
04: batch_size (i32) = 0,
05: num_nodes (i32) = 0,
06: max_scan_range_length (i64) = 0,
07: num_scanner_threads (i32) = 0,
08: max_io_buffers (i32) = 0,
09: allow_unsupported_formats (bool) = false,
10: default_order_by_limit (i64) = -1,
11: debug_action (string) = "",
12: mem_limit (i64) = 700000000000000000,
13: abort_on_default_limit_exceeded (bool) = false,
14: parquet_compression_codec (i32) = 5,
15: hbase_caching (i32) = 0,
16: hbase_cache_blocks (bool) = false,
}
Sometimes, I need to change QueryOption and User with impala-ruby library.
Thanks
Oh, fascinating. How does it work with just a list<string>
?
I'll leave some comments on the diff - also, would you mind adding some tests for the interface and one or two that actually hit the server?
Ohhh, I see why it's a list<string>
. I think I'd like to expose a better interface. Maybe any key other than hadoop_user
could be passed into config? so something like:
query.hadoop_user = options.delete(:hadoop_user) if options[:hadoop_user]
query.configuration = options.map do |key, value|
"#{key}=#{value}"
end
Also, it'd be really nice if we could verify against some list of valid options, but I don't think we can do that without asking the server.
query.configuration = options.map do |key, value|
"#{key}=#{value}"
end
looks good. I'll change it like this.
query.hadoop_user = options.delete(:hadoop_user) if options[:hadoop_user]
I think we might as well separate hadoop_user and configuration according to beeswax.thrift.
also, would you mind adding some tests for the interface and one or two that actually hit the server?
okay. I'll do it this weekend.
I'd rather not have configuration
be a separate param - I know it is separate in the thrift, but it makes a bad interface.
Hi @colinmarc , just refactored my pull request based on your comments. Can you look at it?
Awesome - this is looking great. Can you write one or two connected tests that actually hit the server, so we can be sure we're setting this correctly? If it's hard to check the result of some configuration, don't worry about doing that, but I'd like to at least make sure everything still works if you pass in configuration.
I added a test for query options into test/test_impala_connected.rb and saw that Impalad received the query options properly as follows:
I1105 09:35:36.021066 17227 impala-beeswax-server.cc:133] query(): query=select "foo" AS foo
I1105 09:35:36.021133 17227 impala-beeswax-server.cc:447] query: Query {
01: query (string) = "select \"foo\" AS foo",
03: configuration (list) = list<string>[2] {
[0] = "MEM_LIMIT=1234567890",
[1] = "MAX_SCAN_RANGE_LENGTH=1073741824",
},
04: hadoop_user (string) = "someoneelse",
}
I1105 09:35:36.021191 17227 impala-beeswax-server.cc:460] TClientRequest.queryOptions: TQueryOptions {
01: abort_on_error (bool) = false,
02: max_errors (i32) = 0,
03: disable_codegen (bool) = false,
04: batch_size (i32) = 0,
05: num_nodes (i32) = 0,
06: max_scan_range_length (i64) = 1073741824,
07: num_scanner_threads (i32) = 0,
08: max_io_buffers (i32) = 0,
09: allow_unsupported_formats (bool) = false,
10: default_order_by_limit (i64) = -1,
11: debug_action (string) = "",
12: mem_limit (i64) = 1234567890,
13: abort_on_default_limit_exceeded (bool) = false,
14: parquet_compression_codec (i32) = 5,
15: hbase_caching (i32) = 0,
16: hbase_cache_blocks (bool) = false,
}
But I think it's impossible to check if Impalad received the expected query options in the test code.
Yeah, that was my suspicion. It's still a useful test, though, because at some point in the future Impala could choke or the thrift classes could change or something, and we want to know if everything works smoothly.
Well, I think so, too. Then can you merge the branch? Actually, I tied up now and it's not easy to take time for this branch right now. Thanks.
Sorry that I keep dropping the ball on this. This looks great! Thanks so much.
Thanks!
I want to set "hadoop_user" and "configuration" to a query and created this pull request. Please see it :)
Thanks