Closed FANNG1 closed 1 year ago
The latest updates on your projects. Learn more about Vercel for Git โ๏ธ
Name | Status | Preview | Updated |
---|---|---|---|
databend | โ Ready (Inspect) | Visit Preview | Jun 20, 2022 at 1:34AM (UTC) |
Thanks for the contribution! I have applied any labels matching special text in your PR Changelog.
Please review the labels and make any necessary changes.
@mergify update
update
Hi @sandflee, really appreciate your contribution!!
partition_keys
and location
have been introduced to the struct TableMeta
It will be better if we put them "into" the Tablemeta::engine_options" since they are hive table-specific options, the RepoIssuesTable
may be a good reference:
About the stateless test case of GH action test_stateful_hive_standalone
Hope I get it right that in your local test setup, the databend-query
process can access the FS used by Hive/Hadoop directly (the "... user/hive/warehouse/pokes/... "), but for CI, the warehouse data path is "inside" the docker... thus the integration test failed.
We can ignore the above integration test failure in this PR. since later, the hive table should use the HDFS storage dal to access the data anyway. we need not consolidate all the functionalities in one PR :)
@dantengsky , thanks for you advice!,
1, HiveTableOptions is a good idea, added to the lastest code .
2, I could query data successfully in non-docker env, and also failed in docker env. but the error show like "connect to datanode failed", not the seem as the test error , not sure whether it's the same problem.
Hi @sandflee
That's great to see you are working on the hive. Thank you! It would be great if we add some SQL tests to the hive stateless file: https://github.com/datafuselabs/databend/blob/main/tests/suites/2_stateful_hive/00_basics/00_0000_hms_basics.sql
yes, add sql tests is very useful and necessary, I try some methodes to work with docker env but failed in my local machine, the main reason is : 1, hdfs is deployed in docker env and in docker's internal network, and the datanode ip is the docker internal ip 2, databend is deployed in machine , and could't communication with datanode.
yes, add sql tests is very useful and necessary, I try some methodes to work with docker env but failed in my local machine, the main reason is : 1, hdfs is deployed in docker env and in docker's internal network, and the datanode ip is the docker internal ip 2, databend is deployed in machine , and could't communication with datanode.
@sandflee Roger
please feel free to modify the docker script at
https://github.com/datafuselabs/databend/tree/main/docker/it-hive
if necessary. e.g. exports the necessary ports in the script: https://github.com/datafuselabs/databend/blob/main/docker/it-hive/hive-docker-compose.yml.
but it looks like that the ports of datanodes and namenodes are accessible (I might be completely wrong since I do not know exactly the setup of your local dev):
home-dev:~/workspace/fuse-query$ docker-compose -f "./docker/it-hive/hive-docker-compose.yml" up -d
Recreating it-hive_namenode_1 ... done
Recreating it-hive_datanode_1 ... done
Recreating it-hive_hive-metastore-postgresql_1 ... done
Recreating it-hive_hive-server_1 ... done
Recreating it-hive_hive-metastore_1 ... done
home-dev:~/workspace/fuse-query$ sudo netstat -ntlp | grep -e 50070 -e 50075
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 189119/docker-proxy
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 189437/docker-proxy
tcp6 0 0 :::50075 :::* LISTEN 189125/docker-proxy
tcp6 0 0 :::50070 :::* LISTEN 189444/docker-proxy
home-dev:~/workspace/fuse-query$ telnet localhost 50075
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> Connection closed.
home-dev:~/workspace/fuse-query$ telnet localhost 50070
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> Connection closed.
And as you might already know, the docker-proxy
might be listening to the ports in advance, before the java processes are ready. in this scenario, connecting to the service ports will get transit connection failure.
thanks @dantengsky , the port of my local hdfs conflicts with hdfs in docker, :( I'll try to run databend in hive container, for it needs java and hadoop env. like this: docker exec -it -v $databend_dir:/databend /bin/start_databend.sh
cc @Xuanwo
We have an issue here in the CI setup of hdfs storage, would you please give us some hints?
Caused by:
process didn't exit successfully: `/workspace/target/debug/build/hdfs-sys-637586f141ce0f71/build-script-build` (exit status: 101)
--- stderr
thread 'main' panicked at 'JAVA_HOME must be set: NotPresent', /opt/rust/cargo/registry/src/github.com-1ecc6299db9ec823/hdfs-sys-0.2.0/build.rs:10:43
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
https://github.com/datafuselabs/databend/runs/6892054183?check_suite_focus=true#step:3:1109
We need to set up java runtime to make them work like https://github.com/datafuselabs/opendal/blob/main/.github/workflows/service_test_hdfs.yml#L20-L28
But I don't know if this action works under our build tool.
Ping @everpcpc for a look.
I added java env๏ผbut seems JAVA_HOME is not setใ https://github.com/datafuselabs/databend/pull/5895/commits/3b3fe5f9a96ca06a7bbddb0a760fdddb7bd2c837
We need to set up java runtime to make them work like https://github.com/datafuselabs/opendal/blob/main/.github/workflows/service_test_hdfs.yml#L20-L28
But I don't know if this action works under our build tool.
Ping @everpcpc for a look.
https://github.com/datafuselabs/databend/blob/main/scripts/setup/dev_setup.sh#L491 JDK is installed here.
We need to set up java runtime to make them work like https://github.com/datafuselabs/opendal/blob/main/.github/workflows/service_test_hdfs.yml#L20-L28
But I don't know if this action works under our build tool.
It seems working:
https://github.com/datafuselabs/databend/runs/6892054183?check_suite_focus=true#step:3:118
Run actions/setup-java@v3
Trying to resolve the latest version from remote
Resolved latest version as 11.0.15+10
Trying to download...
Downloading Java 11.0.15+10 (Temurin-Hotspot) from https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.15%2B10/OpenJDK11U-jdk_x64_linux_hotspot_11.0.15_10.tar.gz ...
Extracting Java archive...
/usr/bin/tar xz --warning=no-unknown-keyword -C /runner/_work/_temp/db0bb02a-da8b-42da-ad93-155bd5fb5572 -f /runner/_work/_temp/b5fa93dd-7cb8-4c4a-ac8a-50c236cd6be5
Java 11.0.15+10 was downloaded
Setting Java 11.0.15+10 as the default
Java configuration:
Distribution: temurin
Version: 11.0.15+10
Path: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/11.0.15-10/x64
Creating settings.xml with server-id: github
Writing to /home/runner/.m2/settings.xml
will the above action also set the JAVA_HOME
?
@mergify update
update
@mergify update
update
@mergify update
update
Hi @sandflee,
docker image for building has been updated, the build_linux_hive
action can be successfully executed now.
but we still have to tweak the test_stateful_hive_standalone
a little bit, to meet the runtime dependencies of hdfs storage
layer, and make test_stateful_hive_standalone
work.
The basic idea is, in the stateful hive test environment, we need to set up a JDK, and export the necessary env vars, so that databend-query
with feature storage-hdfs
enabled may find the required runtime dependencies.
And I think, we only need to install JDK, and export LD_LIBRARY_PATH properly at this stage (later, if files are accessed by hdfs protocol, some extra jars and env var should be installed/exported).
thus, my suggestions are :
install jdk in the hive integration action
move the setup-java
action from build_debug_hive/action.yml
to test_stateful_hive_standalone/action.yml
setup env vars for the test action for example, change https://github.com/datafuselabs/databend/blob/89d54b18c854f00b42cd0dc08bb7685eb6e44a4a/.github/actions/test_stateful_hive_standalone/action.yml#L59-L62 to sth like this:
- name: Run Stateful Tests with Standalone mode
shell: bash
run: |
bash ./scripts/ci/ci-run-stateful-hive-tests-standalone-embed-meta.sh
env:
LD_LIBRARY_PATH: ${{ env.JAVA_HOME }}/lib/server:${{ env.LD_LIBRARY_PATH }}
A reference from opendal
I've not tested the above idea, not sure if it works; but if it seems reasonable to you, would you please have a try?
@Xuanwo in this PR, services-hdfs
is just "linked" with databend-query:
no actual hdfs accesses performed during runtime yet
is it enough that we just install the JDK and setup the LD_LIBRARY_PATH properly, to bring up databend-query
?
is it enough that we just install the JDK and setup the LD_LIBRARY_PATH properly, to bring up
databend-query
?
Yep. databend-query
with services-hdfs
just needs libjvm.so
to be compiled. The current setup looks good to me.
Sorry for the late, I have approved this workflow run.
thanks, @dantengsky @Xuanwo , databend could startup now, but failed to query hive data, for the storage is "fs" not "hdfs". we may run though test_stateful_hive_standalone by changing the storage.
@sandflee
Seems the build and run-time dependencies issues are resolved.
Failure of test_stateful_hive_statndalone
is caused by the attempts of accessing hdfs by using fs::backend
, which, at least, in the setup of CI, is not possible.
2022-06-17T07:18:23.397842Z INFO databend_query::servers::mysql::mysql_interactive_worker: Normal query: select * from hive.default.pokes
2022-06-17T07:18:23.505068Z ERROR opendal::services::fs::backend: object /home/runner/work/databend/databend/./.databend/stateless_test_data/user/hive/warehouse/pokes/ list: Custom { kind: NotFound, error: ObjectError { op: "list", path: "/home/runner/work/databend/databend/./.databend/stateless_test_data/user/hive/warehouse/pokes/", source: No such file or directory (os error 2) } }
2022-06-17T07:18:23.512263Z ERROR databend_query::servers::mysql::writers::query_result_writer: OnQuery Error: Code: 3001, displayText = entity not found (object error: (op: list, path: /home/runner/work/databend/databend/./.databend/stateless_test_data/user/hive/warehouse/pokes/, source: No such file or directory (os error 2))).
hdfs-storage
in another PR? storage-hdfs
in the PR is preferred:
hdfs
should be enabled
https://github.com/datafuselabs/databend/blob/02a7b11f4555ab2fb185526a9b4c92b81b3b2577/query/src/config/outer_v0.rs#L216databend-query --storage-type=hdfs --storage-hdfs-name-node= ...
databend-query -h
helps :)test_stateful_hive_standalone
action
similarly to the setup of JDK, a reference :
https://github.com/datafuselabs/opendal/blob/d8aa1265313aad3280ec09ae9babfe9b74c64d04/.github/workflows/service_test_hdfs.yml#L29-L32Maybe we could split the functionality of access data via hdfs-storage in another PR?
ok, remove storage-hdfs depends in latest code
using a track way to run the hive ci, by replacing hdfs storage with local storage
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
add support to query simple hive table
Changelog
Related Issues
4826