apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.94k stars 980 forks source link

[Notice] welcome the ideas and contribution with next release 1.20.0 #2233

Closed luocooong closed 2 years ago

luocooong commented 3 years ago

Is your feature request related to a problem? Please describe.   Hello guys. Apache Drill is a community-driven project. We are welcome any of your contributions (in any way). Drill team had invested a lot of time in the support of community since 2020 Q4.

Describe the solution you'd like   We are regularly discussing how to attract more developers to contribute. Such as adding more friendly guides on the website, marking tasks for newcomers at the JIRA, and update the documentation in time... There are many ways to take part in the Drill :

  1. Create issues with your want (use the Issues on Github).
  2. Create JIRA on the issues.apache.org.
  3. Git fork & clone and contribute the PR. quick-start.
  4. Learning the Drill :
  5. Discussion using the Mailing Lists.
  6. Talk anything on the Slack channel. (Extremely active !)
  7. To Help with testing, feedback and resolve issues with the above ways.

Describe alternatives you've considered   Before that, Drill does not enable the Issues on Github. However, we are happy to see that Drill community is more actively. YES, It's time to use a new simple way to talk with our users and developers.   I recommend that you can create issues on the Github if you want to talk first, then create the JIRA on the apache issues. That, We can both keep the knowledge to the JIRA and quickly support the users and developers on the Github.

Additional context

  1. Welcome to contact us If you want to show your use case.
  2. Community Over Code. Hope to see you soon...
cdmikechen commented 3 years ago

Wish to add more guides. Now I run test like mvn test -pl contrib/storage-hive/core, but it looks like run failed.

Hive Session ID = efd0bb19-db4a-4143-8dd8-f59f79de6e10
[INFO] Running org.apache.drill.exec.fn.hive.TestHiveUDFs
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.09 s <<< FAILURE! - in org.apache.drill.exec.fn.hive.TestHiveUDFs
[ERROR] org.apache.drill.exec.fn.hive.TestHiveUDFs  Time elapsed: 0.014 s  <<< ERROR!
org.apache.drill.exec.rpc.RpcException: CONNECTION : io.netty.channel.ConnectTimeoutException: connection timed out: xxx.local/ip:31010
Caused by: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: xxx.local/ip:31010
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: xxx.local/ip:31010

I don't know why failed and how to fix it, and can not find some more useful document to help me debug.

cdmikechen commented 3 years ago

Wish to add more guides. Now I run test like mvn test -pl contrib/storage-hive/core, but it looks like run failed.

Hive Session ID = efd0bb19-db4a-4143-8dd8-f59f79de6e10
[INFO] Running org.apache.drill.exec.fn.hive.TestHiveUDFs
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.09 s <<< FAILURE! - in org.apache.drill.exec.fn.hive.TestHiveUDFs
[ERROR] org.apache.drill.exec.fn.hive.TestHiveUDFs  Time elapsed: 0.014 s  <<< ERROR!
org.apache.drill.exec.rpc.RpcException: CONNECTION : io.netty.channel.ConnectTimeoutException: connection timed out: xxx.local/ip:31010
Caused by: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: xxx.local/ip:31010
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: xxx.local/ip:31010

I don't know why failed and how to fix it, and can not find some more useful document to help me debug.

Oh It's my fault, I set my computer host to another ip in /etc/hots, so that drill can not find my ip.

I have been used drill since 1.16. At present, I have encountered some problems and some ideas in use:

  1. How to better deploy on k8s. We use drill in a k8s cluster base on a helm using https://github.com/Agirish/drill-helm-charts. Can we merge this project directly into drill or create an operator to deploy drill? According to this helm, I also made some configuration and content modification and adaptation in the actual operation (like add daemonsets mode). Meanwhile this leads to another problem, how to use environment variables to modify the configuration more conveniently. And how to adjust the level of related logs more conveniently, so that users can use docker to start the container more directly.
  2. Is it possible for drill to support Apache Atlas or Apache Ranger? Users maybe will make better use of drill for data processing and analysis by using Ranger to do unified management of data permissions or using Atlas to obtain metadata.
  3. Have we some examples or documents about using drill to query alluxio or ozone? Now I only use drill to query data in HDFS, and I don't know much about other file systems.
luocooong commented 3 years ago

@cdmikechen Thanks for the questions.

  1. I can fork /Agirish/drill-helm-charts and create a branch to driven the Drill on K8S, if anyone ready to contribute the feature. I recommend that you can discussion this topic on mailing list or Slack channel, Talk with @Agirish first.
  2. Ranger? Atlas? Great ideas. All right. Could you please talk this design using the Github's Issues? About the security and multi-tenancy, DRILL-7871 will be include in 1.20.
  3. I remember the Apache Ozone support the S3 API, Drill also can query data on the S3. We need more love to support the Alluxio.

All great ideas have a path to success. We welcome contributions of any kind including pull requests, ideas, bug reports, testing, writing documentation, tutorials and blog posts.

cdmikechen commented 3 years ago

@luocooong About the helm... I noticed that the 1.19 image was recently updated on docker hub. I’ve made some adaptations according to his helm and ran it in the test environment for a long time. I think I can try to submit a PR later, or even change it to an operator directly.

imamerkhanov commented 2 years ago

Guys, what do you think about extending information_schema.columns with the "comment" column? At this moment I need to search the database itself. This was discussed here https://apache-drill.slack.com/archives/CG380K519/p1636905663050200

luocooong commented 2 years ago

@imamerkhanov Hello. Could you please create a new issue (with a detailed requirements)? So the developer will know that there are new good ideas.

alvaradojl commented 2 years ago

It would be great to add support to additional cloud table storage such as azure table. Also, I think the JDBC used as example in the Drill documentation to connect to SQL Server is too old (v6) and maybe that's why it has this issue .

jnturton commented 2 years ago

@alvaradojl thanks for the suggestions. The issue related to LIMIT is actually a bug which we'll tackle soon.

https://issues.apache.org/jira/browse/DRILL-8090