apache / shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Apache License 2.0
19.99k stars 6.75k forks source link

[New feature] Federated SQL query and Query optimization are going to sail out. #8284

Closed tristaZero closed 4 months ago

tristaZero commented 4 years ago

Hi community,

As you know, ShardingSphere has made a lot of efforts on SQL parser and provided a great independent SQL parser to help users parse SQL.

Based on this substantial work, we plan to do query optimization to optimize the input SQLs from users and produce an optimized SQL query plan to improve query efficiency. Plus, the federated SQL query feature (Like join query from different instances) is another essential highlight for our next release. : )

We will leverage Apache Calcite, an excellent framework to implement two of the features. Currently, three main work focus are presented here.

Actually, there are plenty of works to do on this issue. We are in the investigation phase now and will seek contributors for this issue later. If you are interested in this one, please give it a watch. 😉


10th January 2021 Task Update

Hi, community,

Here is the progress update so far.

Functions

Unit test (Q2)

SQL Federation

Scenario

guimingyue commented 4 years ago

Maybe this issue is well for me: How to transform the parsed result of ShardingSphere to the algebra of Calcite. I have done some work a few months ago, and I'm familiar with the ast of ShardingSphere parsed result. Also, I am investigating the query optimization module of calcite, but until now, I think there is still a long way to go.

junwen12221 commented 4 years ago

current level of mycat2

tristaZero commented 4 years ago

@guimingyue Welcome! Yep, it looks like this issue will take up much time, but we can take it step by step. : )

tristaZero commented 4 years ago

Hi @junwen12221 , I gave a brief look at the doc. It looks like you focus on Query optimization rather than federated SQL, doesn't it? You took a lot of work on it, veteran. :-) BTW, if you would like to join this community, welcome anytime!

junwen12221 commented 4 years ago

@tristaZero The federated SQL query feature (Like join query from different instances) I have achieved.We can talk about some aspects of calcite.

tristaZero commented 4 years ago

@junwen12221 That's great! Your link above shows me you spent many efforts on your community, I am not unsure whether you would like to put your mind and experience to best use, I mean, this wonderful feature. :-) Currently, we have another veteran, @guimingyue .

BTW, recently one issue confused me a lot. Could I listen to your great idea? The basic process is,

SQL Statement (parsed ShardingSphere parser)-> RelNode(tranlasted from SQL Statment, need developing)->Optimized RelNode (optimized by CBO)

My question is how to tell Calcite adaptor to use these optimized RelNodes?

AFAIK, Driver.getConnection(CALCITE_URL) is the only way to trigger Calcite to use the custom adaptor, which will bring another SQL parsing from Calcite instead of using our optimized RelNode, doesn't it?

How can we use the custom adaptor without calcite parser?

Your GitHub activity tells me you are skillful at Calcite, anticipate your ideas. : ) Indeed, we welcome any comments from guys in the community!

junwen12221 commented 4 years ago

@tristaZero @guimingyue

Look at the example below

https://github.com/zabetak/calcite/blob/c7646fa6d3054b9e6b9a800e08d18f3cc89922e4/core/src/test/java/org/apache/calcite/examples/foodmart/java/EndToEndExampleEnumerable.java

on line 127 ,you can replace the sql parser on line 156 ,can replace RelNode with RelBuilder on line 165-179,it is optimizer code. after 187 ,the plan runs. so you need a converter that builds RelNode from ShardingSphere Parser or convert ShardingSphere AST to RelNode or ShardingSphere AST to org.apache.calcite.sql.SqlNode.

tristaZero commented 3 years ago

Hi @junwen12221 ,

Sorry for my late response, as I am writing a demo to implement the query optimization and federated SQL these days. Your explanation makes sense to me. My big thanks for your share!
From 104-optimizer, I know you are specializing in Calcite and query optimization. I wonder whether you are interested in this community to participate in this feature? My big welcome. : )

BTW, what do you think ProjectableFilterableTable? I am considering using it instead of TranslatableTable to get data from actual databases. Given push down rule, I suppose ProjectableFilterableTable is enough to filter rows. Besides, it seems TranslatableTable has the same function but is more complicated. Do you think which scenario will make us choose TranslatableTable rather than ProjectableFilterableTable?

Looking forward to your any innovative ideas.

Best wishes, Trista

junwen12221 commented 3 years ago

@tristaZero

I'm glad that more open source projects use calcite to do data sharding based on SQL language. I don't care which project or community. This technology already exists in closed source software.

Calcite converts SQL to LogicalTableScan , before LogicalTableScan applys other table interfaces.

ProjectableFilterableTable(FilterableTable,ScannableTable) has built-in rules in calcite, so you can use it to achieve pushdown projection and filtering, and correspondingly you can generate about select ... from... Where... , but if you want to define more relnode (your own SQL generating rules), you can use TranslatableTable(or perform conversion in other stages). It can use relnode of your defined table instead of using LogicalTableScan. Relatively, you need to write your own rules to optimize it. In fact, a table can be ProjectableFilterableTable, TranslatableTable, FilterableTable, ScannableTableor custom interface.

When the result of TranslatableTableis LogicalTableScan, it is trivial.The result also can be another ProjectableFilterableTable RelNode.

A conservative approach is to distinguish multiple converters or optimizers. In the first stage, when SQL generates relnode, use the built-in table interface and rules of calculate, and then try your own rules or do the next transformation in the second stage.

If you know all the rules, you can perform all transformations in one phase.

guimingyue commented 3 years ago

Before we start the work on this issue, we need to design the schema definition depending on calcite. Now, I'm investigating calcite's schema system。

tristaZero commented 3 years ago

@guimingyue +1, BTW. If your investigation makes progress, welcome your sharing here. A demo is a good way for discussion and presentation.

tristaZero commented 3 years ago

Hi @junwen12221 ,

Very appreciated your explanation. Just to follow your last comment, I gave a detailed look at TranslatableTable and ProjectableFilterableTable.

Here is my understanding, ProjectableFilterableTable provide some simple interfaces (Nothing to do with rules) to allow users to focus on project and filter optimization. At the same time TranslatableTable gives users the most possibilities to control all the relNode conversions (Optimizations), like project, count(), limit etc. The function of Rule is to match and convert RelNode.

Later, I tried to learn more about plan rules, which is a headache for me. For example, what's the difference between ConverterRule and RelRule? What are the CoreRules and EnumerableRules for?

Worse still, the document of Calcite is too skimped to get the answer to these questions above. :( I wonder how you master this complicated tool? Are there any links or docs that can teach users more?

BTW, do you mind exchanging weChat number? If it is possible, could you send your weChat number to panjuan@apache.org? No doubt, talking here is still a good way if you like. :)

Best, Trista

junwen12221 commented 3 years ago

@tristaZero

Generally, logical RelNode(src\main\java\org\apache\calcite\rel\logical) starts from org.apache.calcite.plan.Convention#NONE, for example:

  public static LogicalTableScan create(RelOptCluster cluster,
      final RelOptTable relOptTable, List<RelHint> hints) {
    final Table table = relOptTable.unwrap(Table.class);
    final RelTraitSet traitSet =
        cluster.traitSetOf(Convention.NONE)
            .replaceIfs(RelCollationTraitDef.INSTANCE, () -> {
              if (table != null) {
                return table.getStatistic().getCollations();
              }
              return ImmutableList.of();
            });
    return new LogicalTableScan(cluster, traitSet, hints, relOptTable);
  }

Then it goes through the (rule) converter to transform into the another RelNode with a new org.apache.calcite.plan.Convention, The class that specializes in this conversion is org.apache.calcite.rel.convert.ConverterRule. So it is also a kind of org.apache.calcite.plan.RelOptRule.

org.apache.calcite.plan.RelRule ,a new class recently added, in order to change the behavior of the rule through configuration. You can see: CALCITE-3923

CoreRules also a new class recently added from refactor.Simply put, it sums up the rules of Classes insrc\main\java\org\apache\calcite\rel\logical Its input is logical RelNode, and its output is also logical RelNode, they are all Convention.NONE. Unlike ConverterRule from one Convention RelNode to another Convention RelNode.

According to the same idea, we can guess the purpose of org.apache.calcite.adapter.enumerable.EnumerableRules, It implements the conversion from Convention.NONE to EnumerableConvention.INSTANCE. EnumerableConvention is (EnumerableRel) ReNode of Calcite code generator executor. After the conversion is complete, you can use org.apache.calcite.adapter.enumerable.EnumerableRel#implement to generate the executor.

Despite all the above, they are actually some cases oforg.apache.calcite.plan.RelOptRule.

tristaZero commented 3 years ago

Hi, @junwen12221

Its input is logical RelNode, and its output is also logical RelNode, they are all Convention.NONE. Unlike ConverterRule from one Convention RelNode to another Convention RelNode.

If CoreRules is in the field of Convention.NONE and has nothing to do with Calling convention, what's the role of CoreRules for relNode optimization?

Also, here are some of my thinking about this issue. If I missed something, welcome your correction. :)

  1. It is possible to implement the custom adaptor for federated queries.
  2. We need to consider SQL optimization. If we use Calcite JDBC Driver, the only way to affect the optimization process is to override the interface for different kinds of tables like TranslatableTable and provide RelOptRule.
  3. On the other hand, we may try to skip Calcite JDBC Driver and call the parse, validate, optimize and execute functions in our new custom execute Driver, which needs a in-depth understanding about the source code of Calcite and coding work.
  4. By 3, we can use our parser engine to parse SQL instead of Calcite parser, providing a broad SQL support as our SQL parser can parse SQLs from different databases dialects.
tristaZero commented 3 years ago

My greetings for @junwen12221 @guimingyue .

Based on the points @junwen12221 gave before, I write a demo for this issue, i.e., SQL federation and SQL query optimization (Looking forward to your collaborative effort) ✊ .

In this demo, SQL federation using Calcite JDBC driver can work well. Nonetheless, I also write a raw executor (As 3 mentioned above) with parsing, validating, optimizing and executing, which you can view as a custom executor driver. Supposing this custom executor driver has a run-through process, we can replace Calcite parser with our SQL parser and add more plan rules to findBestExp. @guimingyue 's expectation, right?

Unfortunately, this CalciteRawExecutor succeeded in parsing, validating, optimizing SQLs but is blocked in executing. 😟

I tried any method I can, but failed, so I sincerely seek your any kind help and point! The exception info is presented later, also you can run assertSingleExecute() in CalciteRawExecutorTest to make it recurred.

The reason, I guess, is related to using the custom schema since it is different from this example new ReflectiveSchema().

But I have no approaches to debug or figure out the corresponding solutions. The only similar question I found is apache-calcite-querying-without-using-the-jdbc-api, FYI.

15:01:08.019 [main] DEBUG org.apache.calcite.plan.RelOptPlanner - Provenance:
rel#19:EnumerableCalc.ENUMERABLE(input=EnumerableCalc#18,expr#0..2={inputs},0=$t0)
  direct
    rel#17:EnumerableCalc.ENUMERABLE(input=RelSubset#16,expr#0..2={inputs},0=$t0)
      call#9 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#14:LogicalCalc.NONE(input=RelSubset#6,expr#0..2={inputs},0=$t0)
          call#5 rule [ProjectToCalcRule]
            rel#7:LogicalProject.NONE(input=RelSubset#6,inputs=0)
              no parent
rel#18:EnumerableCalc.ENUMERABLE(input=EnumerableTableScan#11,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
  direct
    rel#15:EnumerableCalc.ENUMERABLE(input=RelSubset#12,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
      call#7 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#13:LogicalCalc.NONE(input=RelSubset#4,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
          call#3 rule [FilterToCalcRule]
            rel#5:LogicalFilter.NONE(input=RelSubset#4,condition=<($0, 10))
              no parent
rel#11:EnumerableTableScan.ENUMERABLE(table=[sharding, t_order])
  call#1 rule [EnumerableTableScanRule(in:NONE,out:ENUMERABLE)]
    rel#1:LogicalTableScan.NONE(table=[sharding, t_order])
      no parent

java.lang.NullPointerException
    at Baz.bind(Unknown Source)
    at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:184)
    at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:173)
    at federated.sql.executor.CalciteRawExecutorTest.assertSingleExecute(CalciteRawExecutorTest.java:61)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
    at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
guimingyue commented 3 years ago

My greetings for @junwen12221 @guimingyue .

Based on the points @junwen12221 gave before, I write a demo for this issue, i.e., SQL federation and SQL query optimization (Looking forward to your collaborative effort) ✊ .

In this demo, SQL federation using Calcite JDBC driver can work well. Nonetheless, I also write a raw executor (As 3 mentioned above) with parsing, validating, optimizing and executing, which you can view as a custom executor driver. Supposing this custom executor driver has a run-through process, we can replace Calcite parser with our SQL parser and add more plan rules to findBestExp. @guimingyue 's expectation, right?

Unfortunately, this CalciteRawExecutor succeeded in parsing, validating, optimizing SQLs but is blocked in executing. 😟

I tried any method I can, but failed, so I sincerely seek your any kind help and point! The exception info is presented later, also you can run assertSingleExecute() in CalciteRawExecutorTest to make it recurred.

The reason, I guess, is related to using the custom schema since it is different from this example new ReflectiveSchema().

But I have no approaches to debug or figure out the corresponding solutions. The only similar question I found is apache-calcite-querying-without-using-the-jdbc-api, FYI.

15:01:08.019 [main] DEBUG org.apache.calcite.plan.RelOptPlanner - Provenance:
rel#19:EnumerableCalc.ENUMERABLE(input=EnumerableCalc#18,expr#0..2={inputs},0=$t0)
  direct
    rel#17:EnumerableCalc.ENUMERABLE(input=RelSubset#16,expr#0..2={inputs},0=$t0)
      call#9 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#14:LogicalCalc.NONE(input=RelSubset#6,expr#0..2={inputs},0=$t0)
          call#5 rule [ProjectToCalcRule]
            rel#7:LogicalProject.NONE(input=RelSubset#6,inputs=0)
              no parent
rel#18:EnumerableCalc.ENUMERABLE(input=EnumerableTableScan#11,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
  direct
    rel#15:EnumerableCalc.ENUMERABLE(input=RelSubset#12,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
      call#7 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
        rel#13:LogicalCalc.NONE(input=RelSubset#4,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
          call#3 rule [FilterToCalcRule]
            rel#5:LogicalFilter.NONE(input=RelSubset#4,condition=<($0, 10))
              no parent
rel#11:EnumerableTableScan.ENUMERABLE(table=[sharding, t_order])
  call#1 rule [EnumerableTableScanRule(in:NONE,out:ENUMERABLE)]
    rel#1:LogicalTableScan.NONE(table=[sharding, t_order])
      no parent

java.lang.NullPointerException
  at Baz.bind(Unknown Source)
  at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:184)
  at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:173)
  at federated.sql.executor.CalciteRawExecutorTest.assertSingleExecute(CalciteRawExecutorTest.java:61)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
  at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
  at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
  at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
  at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)

I will check out this demo, and debug it tomorrow.

junwen12221 commented 3 years ago

@guimingyue @tristaZero
I fix it. https://github.com/tristaZero/federatedSQL/pull/2

guimingyue commented 3 years ago

@guimingyue @tristaZero I fix it. tristaZero/federatedSQL#2

@junwen12221 I add the VM options you mentioned in https://github.com/tristaZero/federatedSQL/pull/2,and I found the line throwed the NPE,It‘s root.getRootSchema().getSubSchema("sharding").getTable("t_order")。 @tristaZero This is because the sharding sub schema you defined in the constructor method CalciteRowExecutor is assigned to CalciteRawExecutor#schema field。Then the DataContext object with the same sub schema is binded to executablePlan,so, this invocation root.getRootSchema() in generated code returned the sharding sub schema which does not have any sub schema.

tristaZero commented 3 years ago

Hi @guimingyue @junwen12221 ,

After adding VM opts, I found the program didn't enter the core part executablePlan.bind() (generated code), though these generated code showed in the IDEA console. How did you proceed with debugging in Baz@5153?

image
junwen12221 commented 3 years ago

@tristaZero @guimingyue

read it.

janino

Debugging
The generated classes can be debugged interactively, even though they were created on-the-fly......

-Dorg.codehaus.janino.source_debugging.dir=federatedSQL\target\

target is maven target folder.

If it doesn't work, you can copy the code of the function to replace the generated object for debugging。

junwen12221 commented 3 years ago

@tristaZero

CoreRules:Just rules that perform logical(not physical,Iterable) transformations on relational expressions.Generally used for RBO.This step mainly applies some heuristic rules, which are rule-based optimizer (RBO), so it is often called RBO stage.

It is possible to implement the custom adaptor for federated queries. Yes.

We need to consider SQL optimization. If we use Calcite JDBC Driver, the only way to affect the optimization process is to override the interface for different kinds of tables like TranslatableTable and provide RelOptRule. Yes.

On the other hand, we may try to skip Calcite JDBC Driver and call the parse, validate, optimize and execute functions in our new custom execute Driver, which needs a in-depth understanding about the source code of Calcite and coding work.

Calcite JDBC Driver provides some default advanced features including lattice,Materialized Views.Or you can use

org.apache.calcite.tools.FrameworkConfig
org.apache.calcite.tools.Frameworks
org.apache.calcite.prepare.PlannerImpl

It may maintain the function of calcite jdbc driver, but it is not JDBC interface.

(adapter)https://calcite.apache.org/docs/adapter.html (lattice)https://calcite.apache.org/docs/lattice.html (Materialized Views)https://calcite.apache.org/docs/materialized_views.html#materialized-views-maintained-by-calcite

By 3, we can use our parser engine to parse SQL instead of Calcite parser, providing a broad SQL support as our SQL parser can parse SQLs from different databases dialects.

On the whole, it can be achieved.

tristaZero commented 3 years ago

Hi @junwen12221 , Glad to see your explanation.

Calcite JDBC Driver provides some default advanced features including lattice,Materialized Views.Or you can use

I suppose these features are far away from us since currently we just focus on the adaptor, rules optimization, and the conversion between parsed result and relNode. Nevertheless, Frameworks is a handy tool including parser, validator and RelConverter.

@guimingyue @junwen12221

Speaking of Adaptor (Federated SQL), I suggest we implement the interface ProjectableFilterableTable initially to run through the whole process well. Next, we can import TranslatableTable for furthermore optimization.

When it comes to Query Optimization, from my perspective, there are two aspects for our consideration (Give it a notice here @guimingyue ).

Notice, The conversion from the SQLStatement of ShardingSphere parser to SqlNode of Calcite is a prerequisite for broad SQL support.

Please be free to share your views on the content above. : )

BTW, I updated federatedSQL project with a non-generated-code converter to avoid janino debug, welcome your check-out. Moreover, I am going to post a summary or blog about How does Calcite handle a SQL? afterward. But...please leave me some more time...

@guimingyue How are you going? If you want to try to write something about query optimization, our federatedSQL repo is the right place, I guess. :)

guimingyue commented 3 years ago

@tristaZero Sorry for the late reply, I have created two converter class for converting SQLStatement to calcite SqlNode and then to calcite RelNode. You can see my progress from this repository guimingyue/shardingsphere Now, I'm trying to add some rules to rewrite RelNode with hep planner.

junwen12221 commented 3 years ago

@guimingyue @tristaZero

org.apache.calcite.schema.ProjectableFilterableTable,it is indeed a quick start interface.

tristaZero commented 3 years ago

Hi @guimingyue ,

I gave a look at your repo, and basically, it is on the right way to do query optimization. : ) Here are some of my thought about how to leverage Calcite to do query optimization and SQL federation for ShardingSphere.

@junwen12221 @guimingyue What's your thinking about the process below? Welcome your any comments.


Process

Preparation

Parse (DONE)

SQLStatement is a concise object of a String SQL and generated by Antlr visitor.

Optimize (TODO) @guimingyue

Execute (TODO)


Plus, @guimingyue, It looks you need at least a ShardingSphereCalciteSchema to do SQL optimization. I will first try to finish the preparation part next week. After that, you can use the Calcite Context for the optimation module (Ready to raise a PR at this time). BTW, my task seems not a blocker for your work. Just pretend you have a ShardingSphereCalciteSchema and continue Sqlnode transformation work. : )

@junwen12221 I am unsure whether you have time programming? If not, could you help review the succeeding PRs?

junwen12221 commented 3 years ago

@tristaZero @guimingyue

There is no problem with the process.

In the past some days, I have been researching optimization. In my project, I have implemented the optimization rules of global secondary index. After completing the test, I will share it.

SQLStatement to sqlnode of calculate is a robust development route. This is the case with one of my query engine implementations, but its performance is not as good as using relbuilder directly.

One of the query language HBT I have implemented is to use relbuilder directly. It differs from SQL language in syntax and lack of schema object and alias scope analysis and type derivation. If we implement these details, the method of directly translating SQL statement into relnode will become possible. On that day, we will only use caclite as optimizer.

I've always been happy to review. I'm interested in the optimization part, and I may join the development in the optimization part.

guimingyue commented 3 years ago

@tristaZero I have created a calcite Schema implementation class ShardingSphereCalciteSchema, I think this class is enough for the optimization phase. With the last two weeks of work, I have finished some work of SQLStatement transformation, But it's still on the way. This is the test case SqlNodeConverterTest. At the same time, I'm also digging into the HepPlanner and VolcanoPlanner to figure out how it works.

@tristaZero @junwen12221 I'm wondering whether the calcite raw executor is a good choice for sharding tables, according to the optimizer design document of Mycat2.0 and this paragraph.

参数化的MycatView ​ MycatView这个名字参考阿里DRDS的View算子,在Mycat2第一代引擎有类似的东西,叫做MycatTransientSQLTableScan,一种临时的TableScan,它以sql字符串和数据源信息来构造。而MycatView使用关系表达式与数据分布来表示,数据分布包含两类,一种是具体的,他们就是已经包含具体物理表的信息。一种是需要赋以参数值才可以得到具体信息,如果不,只能得到全表扫描信息,也就是说,MycatView计算物理表可以是惰性计算的。理想情况下,一个参数化的关系表达式,不同的参数值能影响它扫描分表数量。

junwen12221 commented 3 years ago

@guimingyue enough to complete the run executor.

tristaZero commented 3 years ago

Hi @guimingyue @junwen12221 ,

Calcite schema and Calcite context are created for SQL optimization and SQL federation in #8667 . Do you think it is a convenient and proper one for SQL optimization module?

Aaaaaaron commented 3 years ago

Hi @tristaZero, I do the federate SQL query engine in Bytedance by using calcite and I'm family with calcite, maybe I can do some contribution.

[ref1] https://aaaaaaron.github.io/2020/02/08/Calcite-Parser-%E9%83%A8%E5%88%86/ [ref2] https://aaaaaaron.github.io/2020/02/08/Calcite-Volcano-Optimizer-%E9%83%A8%E5%88%86/

tristaZero commented 3 years ago

Hi @Aaaaaaron ,

We are waiting for you, welcome! Please leave some time to give your references some looks, and then we can have an in-depth discussion. BTW, I suggest you look at the PRs above firstly, which can help you with the context of our work.

Best, Trista

junwen12221 commented 3 years ago

@tristaZero

I may need to see more submitted code before I can answer. I think they look like org.apache.calcite.prepare.PlannerImpl /org.apache.calcite.tools.FrameworkConfig/org.apache.calcite.plan.RelOptCluster

tristaZero commented 3 years ago

Hi @Aaaaaaron ,

Your blog has an in-depth introduction for Calcite, great. : ) Please have my welcome. Like Schedule said, now I am in the preparation stage, and @guimingyue will pay his attention to Optimize work. Plus, we have @junwen12221 provides more great ideas about this plan. If you're interested in joining, I guess the tasks under Execute part will be the right choice for participating in the SQL federation. How do you think?

Trista

guimingyue commented 3 years ago

Hi @Aaaaaaron ,

Your blog has an in-depth introduction for Calcite, great. : ) Please have my welcome. Like Schedule said, now I am in the preparation stage, and @guimingyue will pay his attention to Optimize work. Plus, we have @junwen12221 provides more great ideas about this plan. If you're interested in joining, I guess the tasks under Execute part will be the right choice for participating in the SQL federation. How do you think?

Trista

@tristaZero Executor depends on the optimizer result which is the physical plan, @Aaaaaaron I think we can communicate on the design of both the optimizer and the executor.

tristaZero commented 3 years ago

@guimingyue @junwen12221 @Aaaaaaron ,

Hi guys,

Currently, the query optimization module has been plugging into ShardingSphere (From parse to execute). However, it is just like a fridge. That is, we need to put more valuable food(SqlNode transformation, planner, Interpretable) to it. The related classes are SqlNodeConverter, PlannerInitializer, CalciteRawExecutor waiting to be implemented.

Especially, I want to listen to your opinion how to get a resultSet if we use Interpretable of Calcite in CalciteRawExecutor (I have no idea about resultSetMetadata). Why do we need to get a resultSet? The reason is that we have to return QueryResult from CalciteRawExecutor to dock with the executor of ShardingSphere (Refer to CalciteJDBCExecutor).

@guimingyue, do you have any plan for SqlNodeConverter, PlannerInitializer? @Aaaaaaron @junwen12221 I am unsure whether you still want to join in this feature (CalciteRawExecutor needs your attention).

guimingyue commented 3 years ago

@tristaZero I'm working on the optimizer in this repo forked from apache shardingsphere. And I add an optimizer entrance class named DefaultPlanner. As many other optimizer, there is also a class containing a collection of rules. In this project, this class is PlannerRules

tristaZero commented 3 years ago

@guimingyue

Sound nice. Do you think it is possible to raise a PR with DefaultPlanner?
Honestly speaking, the first PR always troubles the assigner a lot.

guimingyue commented 3 years ago

@tristaZero

It's not the right time to raise a PR now. I will keep working on theDefaultPlanner before raising a PR.

junwen12221 commented 3 years ago

@tristaZero

`new ResultSetMetadata(physical.getRowType().getFieldList());

fieldList.get(column).getType().getSqlTypeName().getJdbcOrdinal(); fieldList.get(column).getName();

`

physical is a RelNode.

tristaZero commented 3 years ago

@guimingyue Ok, go ahead, if you like to give theDefaultPlanner a relatively complete implement. Do you think how long you will take to finish theDefaultPlanner?

@junwen12221 Thanks for your comment, I will give it a look.

guimingyue commented 3 years ago

@tristaZero Maybe, The time will be about two months. The work related to the DefaultPlanner also includes an executor.

tristaZero commented 3 years ago

@guimingyue no problem, follow your schedule.

wgy8283335 commented 3 years ago

I'd like to join this issue.

tristaZero commented 3 years ago

Hi @wgy8283335 ,

Welcome your attention. After @guimingyue 's PR is raised, there will be a fittable time to join. Please be patient with that.

wgy8283335 commented 3 years ago

Hi @tristaZero, since guimingyue 's PR is raised, should I prepare to do something calcite-related after the Qingming holiday?

wgy8283335 commented 3 years ago

Hi @tristaZero, since guimingyue 's PR is raised, should I prepare to do something calcite-related after the Qingming holiday?

tristaZero commented 3 years ago

@wgy8283335 We have to wait @guimingyue 's PR merged. It still needs some time, I guess.

guimingyue commented 3 years ago

@wgy8283335 I think you can start to learn something about the AST of calcite works, since you are familiar with Antlr, and then you can try to fix the unsupported feature of the SqlNodeConverter. The CalciteMySqlNodeVisitor of Druid could be a good reference for converting ShardingSphere AST to Calcite AST.

wgy8283335 commented 3 years ago

@wgy8283335 I think you can start to learn something about the AST of calcite works, since you are familiar with Antlr, and then you can try to fix the unsupported feature of the SqlNodeConverter. The CalciteMySqlNodeVisitor of Druid could be a good reference for converting ShardingSphere AST to Calcite AST.

@guimingyue thank you, that's a good idea.

github-actions[bot] commented 2 years ago

Hello , this issue has not received a reply for several days. This issue is supposed to be closed.