Closed tristaZero closed 4 months ago
Maybe this issue is well for me: How to transform the parsed result of ShardingSphere to the algebra of Calcite. I have done some work a few months ago, and I'm familiar with the ast of ShardingSphere parsed result. Also, I am investigating the query optimization module of calcite, but until now, I think there is still a long way to go.
@guimingyue Welcome! Yep, it looks like this issue will take up much time, but we can take it step by step. : )
Hi @junwen12221 , I gave a brief look at the doc. It looks like you focus on Query optimization rather than federated SQL, doesn't it? You took a lot of work on it, veteran. :-) BTW, if you would like to join this community, welcome anytime!
@tristaZero The federated SQL query feature (Like join query from different instances) I have achieved.We can talk about some aspects of calcite.
@junwen12221 That's great! Your link above shows me you spent many efforts on your community, I am not unsure whether you would like to put your mind and experience to best use, I mean, this wonderful feature. :-) Currently, we have another veteran, @guimingyue .
BTW, recently one issue confused me a lot. Could I listen to your great idea? The basic process is,
SQL Statement (parsed ShardingSphere parser)-> RelNode(tranlasted from SQL Statment, need developing)->Optimized RelNode (optimized by CBO)
My question is how to tell Calcite adaptor to use these optimized RelNodes?
AFAIK, Driver.getConnection(CALCITE_URL) is the only way to trigger Calcite to use the custom adaptor, which will bring another SQL parsing from Calcite instead of using our optimized RelNode, doesn't it?
How can we use the custom adaptor without calcite parser?
Your GitHub activity tells me you are skillful at Calcite, anticipate your ideas. : ) Indeed, we welcome any comments from guys in the community!
@tristaZero @guimingyue
Look at the example below
https://github.com/zabetak/calcite/blob/c7646fa6d3054b9e6b9a800e08d18f3cc89922e4/core/src/test/java/org/apache/calcite/examples/foodmart/java/EndToEndExampleEnumerable.java
on line 127 ,you can replace the sql parser on line 156 ,can replace RelNode with RelBuilder on line 165-179,it is optimizer code. after 187 ,the plan runs. so you need a converter that builds RelNode from ShardingSphere Parser or convert ShardingSphere AST to RelNode or ShardingSphere AST to org.apache.calcite.sql.SqlNode.
Hi @junwen12221 ,
Sorry for my late response, as I am writing a demo to implement the query optimization and federated SQL these days.
Your explanation makes sense to me. My big thanks for your share!
From 104-optimizer, I know you are specializing in Calcite and query optimization. I wonder whether you are interested in this community to participate in this feature? My big welcome. : )
BTW, what do you think ProjectableFilterableTable
? I am considering using it instead of TranslatableTable
to get data from actual databases. Given push down
rule, I suppose ProjectableFilterableTable
is enough to filter rows. Besides, it seems TranslatableTable
has the same function but is more complicated. Do you think which scenario will make us choose TranslatableTable
rather than ProjectableFilterableTable
?
Looking forward to your any innovative ideas.
Best wishes, Trista
@tristaZero
I'm glad that more open source projects use calcite to do data sharding based on SQL language. I don't care which project or community. This technology already exists in closed source software.
Calcite converts SQL to LogicalTableScan
, before LogicalTableScan
applys other table interfaces.
ProjectableFilterableTable
(FilterableTable
,ScannableTable
) has built-in rules in calcite, so you can use it to achieve pushdown projection and filtering, and correspondingly you can generate about select ... from... Where...
, but if you want to define more relnode (your own SQL generating rules), you can use TranslatableTable
(or perform conversion in other stages). It can use relnode of your defined table instead of using LogicalTableScan
. Relatively, you need to write your own rules to optimize it. In fact, a table can be ProjectableFilterableTable
, TranslatableTable
, FilterableTable
, ScannableTable
or custom interface.
When the result of TranslatableTable
is LogicalTableScan
, it is trivial.The result also can be another ProjectableFilterableTable
RelNode.
A conservative approach is to distinguish multiple converters or optimizers. In the first stage, when SQL generates relnode, use the built-in table interface and rules of calculate, and then try your own rules or do the next transformation in the second stage.
If you know all the rules, you can perform all transformations in one phase.
Before we start the work on this issue, we need to design the schema definition depending on calcite. Now, I'm investigating calcite's schema system。
@guimingyue +1, BTW. If your investigation makes progress, welcome your sharing here. A demo is a good way for discussion and presentation.
Hi @junwen12221 ,
Very appreciated your explanation.
Just to follow your last comment, I gave a detailed look at TranslatableTable
and ProjectableFilterableTable
.
Here is my understanding,
ProjectableFilterableTable
provide some simple interfaces (Nothing to do with rules) to allow users to focus on project
and filter
optimization. At the same time TranslatableTable
gives users the most possibilities to control all the relNode conversions (Optimizations), like project
, count()
, limit
etc. The function of Rule
is to match and convert RelNode
.
Later, I tried to learn more about plan rules, which is a headache for me. For example, what's the difference between ConverterRule
and RelRule
? What are the CoreRules
and EnumerableRules
for?
Worse still, the document of Calcite is too skimped to get the answer to these questions above. :( I wonder how you master this complicated tool? Are there any links or docs that can teach users more?
BTW, do you mind exchanging weChat number? If it is possible, could you send your weChat number to panjuan@apache.org
? No doubt, talking here is still a good way if you like. :)
Best, Trista
@tristaZero
Generally, logical RelNode(src\main\java\org\apache\calcite\rel\logical
)
starts from org.apache.calcite.plan.Convention#NONE
,
for example:
public static LogicalTableScan create(RelOptCluster cluster,
final RelOptTable relOptTable, List<RelHint> hints) {
final Table table = relOptTable.unwrap(Table.class);
final RelTraitSet traitSet =
cluster.traitSetOf(Convention.NONE)
.replaceIfs(RelCollationTraitDef.INSTANCE, () -> {
if (table != null) {
return table.getStatistic().getCollations();
}
return ImmutableList.of();
});
return new LogicalTableScan(cluster, traitSet, hints, relOptTable);
}
Then it goes through the (rule) converter to transform into the another RelNode with a new org.apache.calcite.plan.Convention
,
The class that specializes in this conversion is org.apache.calcite.rel.convert.ConverterRule
.
So it is also a kind of org.apache.calcite.plan.RelOptRule
.
org.apache.calcite.plan.RelRule ,a new class recently added, in order to change the behavior of the rule through configuration. You can see: CALCITE-3923
CoreRules also a new class recently added from refactor.Simply put, it sums up the rules of
Classes insrc\main\java\org\apache\calcite\rel\logical
Its input is logical RelNode, and its output is also logical RelNode,
they are all Convention.NONE
. Unlike ConverterRule
from one Convention RelNode to another Convention RelNode.
According to the same idea, we can guess the purpose of org.apache.calcite.adapter.enumerable.EnumerableRules
,
It implements the conversion from Convention.NONE
to EnumerableConvention.INSTANCE
.
EnumerableConvention is (EnumerableRel) ReNode of Calcite code generator executor.
After the conversion is complete, you can use org.apache.calcite.adapter.enumerable.EnumerableRel#implemen
t to generate the executor.
Despite all the above, they are actually some cases oforg.apache.calcite.plan.RelOptRule
.
Hi, @junwen12221
Its input is logical RelNode, and its output is also logical RelNode, they are all Convention.NONE. Unlike ConverterRule from one Convention RelNode to another Convention RelNode.
If CoreRules
is in the field of Convention.NONE
and has nothing to do with Calling convention
, what's the role of CoreRules
for relNode optimization?
Also, here are some of my thinking about this issue. If I missed something, welcome your correction. :)
TranslatableTable
and provide RelOptRule
. parse
, validate
, optimize
and execute
functions in our new custom execute Driver
, which needs a in-depth understanding about the source code of Calcite and coding work.My greetings for @junwen12221 @guimingyue .
Based on the points @junwen12221 gave before, I write a demo for this issue, i.e., SQL federation and SQL query optimization (Looking forward to your collaborative effort) ✊ .
In this demo, SQL federation using Calcite JDBC driver can work well. Nonetheless, I also write a raw executor (As 3 mentioned above) with parsing, validating, optimizing and executing, which you can view as a custom executor driver. Supposing this custom executor driver has a run-through process, we can replace Calcite parser with our SQL parser and add more plan rules to findBestExp
. @guimingyue 's expectation, right?
Unfortunately, this CalciteRawExecutor
succeeded in parsing, validating, optimizing SQLs but is blocked in executing. 😟
I tried any method I can, but failed, so I sincerely seek your any kind help and point! The exception info is presented later, also you can run assertSingleExecute()
in CalciteRawExecutorTest
to make it recurred.
The reason, I guess, is related to using the custom schema since it is different from this example new ReflectiveSchema().
But I have no approaches to debug or figure out the corresponding solutions. The only similar question I found is apache-calcite-querying-without-using-the-jdbc-api, FYI.
15:01:08.019 [main] DEBUG org.apache.calcite.plan.RelOptPlanner - Provenance:
rel#19:EnumerableCalc.ENUMERABLE(input=EnumerableCalc#18,expr#0..2={inputs},0=$t0)
direct
rel#17:EnumerableCalc.ENUMERABLE(input=RelSubset#16,expr#0..2={inputs},0=$t0)
call#9 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
rel#14:LogicalCalc.NONE(input=RelSubset#6,expr#0..2={inputs},0=$t0)
call#5 rule [ProjectToCalcRule]
rel#7:LogicalProject.NONE(input=RelSubset#6,inputs=0)
no parent
rel#18:EnumerableCalc.ENUMERABLE(input=EnumerableTableScan#11,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
direct
rel#15:EnumerableCalc.ENUMERABLE(input=RelSubset#12,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
call#7 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)]
rel#13:LogicalCalc.NONE(input=RelSubset#4,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4)
call#3 rule [FilterToCalcRule]
rel#5:LogicalFilter.NONE(input=RelSubset#4,condition=<($0, 10))
no parent
rel#11:EnumerableTableScan.ENUMERABLE(table=[sharding, t_order])
call#1 rule [EnumerableTableScanRule(in:NONE,out:ENUMERABLE)]
rel#1:LogicalTableScan.NONE(table=[sharding, t_order])
no parent
java.lang.NullPointerException
at Baz.bind(Unknown Source)
at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:184)
at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:173)
at federated.sql.executor.CalciteRawExecutorTest.assertSingleExecute(CalciteRawExecutorTest.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
My greetings for @junwen12221 @guimingyue .
Based on the points @junwen12221 gave before, I write a demo for this issue, i.e., SQL federation and SQL query optimization (Looking forward to your collaborative effort) ✊ .
In this demo, SQL federation using Calcite JDBC driver can work well. Nonetheless, I also write a raw executor (As 3 mentioned above) with parsing, validating, optimizing and executing, which you can view as a custom executor driver. Supposing this custom executor driver has a run-through process, we can replace Calcite parser with our SQL parser and add more plan rules to
findBestExp
. @guimingyue 's expectation, right?Unfortunately, this
CalciteRawExecutor
succeeded in parsing, validating, optimizing SQLs but is blocked in executing. 😟I tried any method I can, but failed, so I sincerely seek your any kind help and point! The exception info is presented later, also you can run
assertSingleExecute()
inCalciteRawExecutorTest
to make it recurred.The reason, I guess, is related to using the custom schema since it is different from this example new ReflectiveSchema().
But I have no approaches to debug or figure out the corresponding solutions. The only similar question I found is apache-calcite-querying-without-using-the-jdbc-api, FYI.
15:01:08.019 [main] DEBUG org.apache.calcite.plan.RelOptPlanner - Provenance: rel#19:EnumerableCalc.ENUMERABLE(input=EnumerableCalc#18,expr#0..2={inputs},0=$t0) direct rel#17:EnumerableCalc.ENUMERABLE(input=RelSubset#16,expr#0..2={inputs},0=$t0) call#9 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)] rel#14:LogicalCalc.NONE(input=RelSubset#6,expr#0..2={inputs},0=$t0) call#5 rule [ProjectToCalcRule] rel#7:LogicalProject.NONE(input=RelSubset#6,inputs=0) no parent rel#18:EnumerableCalc.ENUMERABLE(input=EnumerableTableScan#11,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4) direct rel#15:EnumerableCalc.ENUMERABLE(input=RelSubset#12,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4) call#7 rule [EnumerableCalcRule(in:NONE,out:ENUMERABLE)] rel#13:LogicalCalc.NONE(input=RelSubset#4,expr#0..2={inputs},expr#3=10,expr#4=<($t0, $t3),proj#0..2={exprs},$condition=$t4) call#3 rule [FilterToCalcRule] rel#5:LogicalFilter.NONE(input=RelSubset#4,condition=<($0, 10)) no parent rel#11:EnumerableTableScan.ENUMERABLE(table=[sharding, t_order]) call#1 rule [EnumerableTableScanRule(in:NONE,out:ENUMERABLE)] rel#1:LogicalTableScan.NONE(table=[sharding, t_order]) no parent java.lang.NullPointerException at Baz.bind(Unknown Source) at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:184) at federated.sql.executor.CalciteRawExecutor.execute(CalciteRawExecutor.java:173) at federated.sql.executor.CalciteRawExecutorTest.assertSingleExecute(CalciteRawExecutorTest.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
I will check out this demo, and debug it tomorrow.
@guimingyue @tristaZero
I fix it.
https://github.com/tristaZero/federatedSQL/pull/2
@guimingyue @tristaZero I fix it. tristaZero/federatedSQL#2
@junwen12221 I add the VM options you mentioned in https://github.com/tristaZero/federatedSQL/pull/2,and I found the line throwed the NPE,It‘s root.getRootSchema().getSubSchema("sharding").getTable("t_order")
。
@tristaZero
This is because the sharding sub schema you defined in the constructor method CalciteRowExecutor
is assigned to CalciteRawExecutor#schema
field。Then the DataContext
object with the same sub schema is binded to executablePlan
,so, this invocation root.getRootSchema()
in generated code returned the sharding sub schema which does not have any sub schema.
Hi @guimingyue @junwen12221 ,
After adding VM opts, I found the program didn't enter the core part executablePlan.bind()
(generated code), though these generated code showed in the IDEA console. How did you proceed with debugging in Baz@5153
?
@tristaZero @guimingyue
read it.
Debugging
The generated classes can be debugged interactively, even though they were created on-the-fly......
-Dorg.codehaus.janino.source_debugging.dir=federatedSQL\target\
target is maven target folder.
If it doesn't work, you can copy the code of the function to replace the generated object for debugging。
@tristaZero
CoreRules:Just rules that perform logical(not physical,Iterable) transformations on relational expressions.Generally used for RBO.This step mainly applies some heuristic rules, which are rule-based optimizer (RBO), so it is often called RBO stage.
It is possible to implement the custom adaptor for federated queries. Yes.
We need to consider SQL optimization. If we use Calcite JDBC Driver, the only way to affect the optimization process is to override the interface for different kinds of tables like TranslatableTable and provide RelOptRule. Yes.
On the other hand, we may try to skip Calcite JDBC Driver and call the parse, validate, optimize and execute functions in our new custom execute Driver, which needs a in-depth understanding about the source code of Calcite and coding work.
Calcite JDBC Driver provides some default advanced features including lattice,Materialized Views.Or you can use
org.apache.calcite.tools.FrameworkConfig
org.apache.calcite.tools.Frameworks
org.apache.calcite.prepare.PlannerImpl
It may maintain the function of calcite jdbc driver, but it is not JDBC interface.
(adapter)https://calcite.apache.org/docs/adapter.html (lattice)https://calcite.apache.org/docs/lattice.html (Materialized Views)https://calcite.apache.org/docs/materialized_views.html#materialized-views-maintained-by-calcite
By 3, we can use our parser engine to parse SQL instead of Calcite parser, providing a broad SQL support as our SQL parser can parse SQLs from different databases dialects.
On the whole, it can be achieved.
Hi @junwen12221 , Glad to see your explanation.
Calcite JDBC Driver provides some default advanced features including lattice,Materialized Views.Or you can use
I suppose these features are far away from us since currently we just focus on the adaptor, rules optimization, and the conversion between parsed result and relNode
. Nevertheless, Frameworks
is a handy tool including parser
, validator
and RelConverter
.
@guimingyue @junwen12221
Speaking of Adaptor
(Federated SQL), I suggest we implement the interface ProjectableFilterableTable
initially to run through the whole process well. Next, we can import TranslatableTable
for furthermore optimization.
When it comes to Query Optimization
, from my perspective, there are two aspects for our consideration (Give it a notice here @guimingyue ).
CoreRules
and EnumerableRules
to perform logical transformations on relational expressionsTranslatableTable
to register custom rules (Like CoreRules)Notice, The conversion from the SQLStatement of ShardingSphere parser to SqlNode of Calcite is a prerequisite for broad SQL support.
Please be free to share your views on the content above. : )
BTW, I updated federatedSQL project with a non-generated-code converter to avoid janino debug
, welcome your check-out. Moreover, I am going to post a summary or blog about How does Calcite handle a SQL?
afterward. But...please leave me some more time...
@guimingyue How are you going? If you want to try to write something about query optimization, our federatedSQL repo is the right place, I guess. :)
@tristaZero Sorry for the late reply, I have created two converter class for converting SQLStatement to calcite SqlNode and then to calcite RelNode. You can see my progress from this repository guimingyue/shardingsphere Now, I'm trying to add some rules to rewrite RelNode with hep planner.
@guimingyue @tristaZero
org.apache.calcite.schema.ProjectableFilterableTable,it is indeed a quick start interface.
Hi @guimingyue ,
I gave a look at your repo, and basically, it is on the right way to do query optimization. : ) Here are some of my thought about how to leverage Calcite to do query optimization and SQL federation for ShardingSphere.
@junwen12221 @guimingyue What's your thinking about the process below? Welcome your any comments.
sharding rules
(TODO)Calcite schema
and Calcite context
in SchemaContexts
for the following SQL optimization and SQL federation (TODO) SQLStatement is a concise object of a String SQL and generated by Antlr visitor.
SQLStatement
to SqlNode
of Calcite
RelNode
(BestPlan)InterpretableConverter
of Calcite
Enumerable<Object[]>
and ResultSetMetaData
as QueryResult
of ShardingSphere
QueryResult
to executeEngine
of ShardingSphere
Plus, @guimingyue, It looks you need at least a ShardingSphereCalciteSchema
to do SQL optimization. I will first try to finish the preparation part next week. After that, you can use the Calcite Context for the optimation module (Ready to raise a PR at this time). BTW, my task seems not a blocker for your work. Just pretend you have a ShardingSphereCalciteSchema
and continue Sqlnode transformation work. : )
@junwen12221 I am unsure whether you have time programming? If not, could you help review the succeeding PRs?
@tristaZero @guimingyue
There is no problem with the process.
In the past some days, I have been researching optimization. In my project, I have implemented the optimization rules of global secondary index. After completing the test, I will share it.
SQLStatement to sqlnode of calculate is a robust development route. This is the case with one of my query engine implementations, but its performance is not as good as using relbuilder directly.
One of the query language HBT I have implemented is to use relbuilder directly. It differs from SQL language in syntax and lack of schema object and alias scope analysis and type derivation. If we implement these details, the method of directly translating SQL statement into relnode will become possible. On that day, we will only use caclite as optimizer.
I've always been happy to review. I'm interested in the optimization part, and I may join the development in the optimization part.
@tristaZero I have created a calcite Schema implementation class ShardingSphereCalciteSchema, I think this class is enough for the optimization phase. With the last two weeks of work, I have finished some work of SQLStatement transformation, But it's still on the way. This is the test case SqlNodeConverterTest. At the same time, I'm also digging into the HepPlanner and VolcanoPlanner to figure out how it works.
@tristaZero @junwen12221 I'm wondering whether the calcite raw executor is a good choice for sharding tables, according to the optimizer design document of Mycat2.0 and this paragraph.
参数化的MycatView MycatView这个名字参考阿里DRDS的View算子,在Mycat2第一代引擎有类似的东西,叫做MycatTransientSQLTableScan,一种临时的TableScan,它以sql字符串和数据源信息来构造。而MycatView使用关系表达式与数据分布来表示,数据分布包含两类,一种是具体的,他们就是已经包含具体物理表的信息。一种是需要赋以参数值才可以得到具体信息,如果不,只能得到全表扫描信息,也就是说,MycatView计算物理表可以是惰性计算的。理想情况下,一个参数化的关系表达式,不同的参数值能影响它扫描分表数量。
@guimingyue enough to complete the run executor.
Hi @guimingyue @junwen12221 ,
Calcite schema
and Calcite context
are created for SQL optimization and SQL federation in #8667 . Do you think it is a convenient and proper one for SQL optimization module?
Hi @tristaZero, I do the federate SQL query engine in Bytedance by using calcite and I'm family with calcite, maybe I can do some contribution.
[ref1] https://aaaaaaron.github.io/2020/02/08/Calcite-Parser-%E9%83%A8%E5%88%86/ [ref2] https://aaaaaaron.github.io/2020/02/08/Calcite-Volcano-Optimizer-%E9%83%A8%E5%88%86/
Hi @Aaaaaaron ,
We are waiting for you, welcome! Please leave some time to give your references some looks, and then we can have an in-depth discussion. BTW, I suggest you look at the PRs above firstly, which can help you with the context of our work.
Best, Trista
@tristaZero
I may need to see more submitted code before I can answer. I think they look like org.apache.calcite.prepare.PlannerImpl /org.apache.calcite.tools.FrameworkConfig/org.apache.calcite.plan.RelOptCluster
Hi @Aaaaaaron ,
Your blog has an in-depth introduction for Calcite, great. : ) Please have my welcome.
Like Schedule said, now I am in the preparation stage, and @guimingyue will pay his attention to Optimize
work. Plus, we have @junwen12221 provides more great ideas about this plan. If you're interested in joining, I guess the tasks under Execute
part will be the right choice for participating in the SQL federation. How do you think?
Trista
Hi @Aaaaaaron ,
Your blog has an in-depth introduction for Calcite, great. : ) Please have my welcome. Like Schedule said, now I am in the preparation stage, and @guimingyue will pay his attention to
Optimize
work. Plus, we have @junwen12221 provides more great ideas about this plan. If you're interested in joining, I guess the tasks underExecute
part will be the right choice for participating in the SQL federation. How do you think?Trista
@tristaZero Executor depends on the optimizer result which is the physical plan, @Aaaaaaron I think we can communicate on the design of both the optimizer and the executor.
@guimingyue @junwen12221 @Aaaaaaron ,
Hi guys,
Currently, the query optimization module has been plugging into ShardingSphere (From parse to execute). However, it is just like a fridge
. That is, we need to put more valuable food
(SqlNode transformation, planner, Interpretable) to it. The related classes are SqlNodeConverter
, PlannerInitializer
, CalciteRawExecutor
waiting to be implemented.
Especially, I want to listen to your opinion how to get a resultSet
if we use Interpretable
of Calcite
in CalciteRawExecutor
(I have no idea about resultSetMetadata
). Why do we need to get a resultSet
? The reason is that we have to return QueryResult
from CalciteRawExecutor
to dock with the executor of ShardingSphere (Refer to CalciteJDBCExecutor
).
@guimingyue, do you have any plan for SqlNodeConverter
, PlannerInitializer
?
@Aaaaaaron @junwen12221 I am unsure whether you still want to join in this feature (CalciteRawExecutor
needs your attention).
@tristaZero I'm working on the optimizer in this repo forked from apache shardingsphere. And I add an optimizer entrance class named DefaultPlanner. As many other optimizer, there is also a class containing a collection of rules. In this project, this class is PlannerRules
@guimingyue
Sound nice.
Do you think it is possible to raise a PR with DefaultPlanner
?
Honestly speaking, the first PR always troubles the assigner a lot.
@tristaZero
It's not the right time to raise a PR now. I will keep working on theDefaultPlanner
before raising a PR.
@tristaZero
`new ResultSetMetadata(physical.getRowType().getFieldList());
fieldList.get(column).getType().getSqlTypeName().getJdbcOrdinal(); fieldList.get(column).getName();
`
physical is a RelNode.
@guimingyue
Ok, go ahead, if you like to give theDefaultPlanner
a relatively complete implement. Do you think how long you will take to finish theDefaultPlanner
?
@junwen12221 Thanks for your comment, I will give it a look.
@tristaZero Maybe, The time will be about two months. The work related to the DefaultPlanner
also includes an executor.
@guimingyue no problem, follow your schedule.
I'd like to join this issue.
Hi @wgy8283335 ,
Welcome your attention. After @guimingyue 's PR is raised, there will be a fittable time to join. Please be patient with that.
Hi @tristaZero, since guimingyue 's PR is raised, should I prepare to do something calcite-related after the Qingming holiday?
Hi @tristaZero, since guimingyue 's PR is raised, should I prepare to do something calcite-related after the Qingming holiday?
@wgy8283335 We have to wait @guimingyue 's PR merged. It still needs some time, I guess.
@wgy8283335 I think you can start to learn something about the AST of calcite works, since you are familiar with Antlr, and then you can try to fix the unsupported feature of the SqlNodeConverter. The CalciteMySqlNodeVisitor of Druid could be a good reference for converting ShardingSphere AST to Calcite AST.
@wgy8283335 I think you can start to learn something about the AST of calcite works, since you are familiar with Antlr, and then you can try to fix the unsupported feature of the SqlNodeConverter. The CalciteMySqlNodeVisitor of Druid could be a good reference for converting ShardingSphere AST to Calcite AST.
@guimingyue thank you, that's a good idea.
Hello , this issue has not received a reply for several days. This issue is supposed to be closed.
Hi community,
As you know, ShardingSphere has made a lot of efforts on SQL parser and provided a great independent SQL parser to help users parse SQL.
Based on this substantial work, we plan to do query optimization to optimize the input SQLs from users and produce an optimized SQL query plan to improve query efficiency. Plus, the federated SQL query feature (Like join query from different instances) is another essential highlight for our next release. : )
We will leverage Apache Calcite, an excellent framework to implement two of the features. Currently, three main work focus are presented here.
TranslatableTable
API.Actually, there are plenty of works to do on this issue. We are in the investigation phase now and will seek contributors for this issue later. If you are interested in this one, please give it a watch. 😉
10th January 2021 Task Update
Hi, community,
Here is the progress update so far.
Functions
CalciteLogicSchemaFactory
,CalciteLogicSchema
, andCalciteFilterableTable
CalciteFilterableTable
(CalciteExecutionSQLGenerator) (Q2)CalciteRawExecutor
(EspeciallyResultSetMetadata
) (Q2)CalciteExecutor
(Q2)Unit test (Q2)
SQL Federation
Scenario