Open MingyuGuan opened 5 years ago
How to connect each RelNode?
Just my 2 cents. RelNodes (or operators) which can be executed in one "list_execute" in Husky may be grouped together as a task. Then each boundary of two tasks is the point where the data needs to be shuffled. Thus, a shuffle operator may be defined. During the convertion from logical plans to physical plans, this shuffle operator is inserted whenver needed.
@lmatz @zzxx-husky @ydwu4 What do you think about the optimization rules as implemented in #4 ?
@lisy45 @Sleepy-Neko @WuZihao1 Please read the rule part in pr #4 as well.
@lisy45 @Sleepy-Neko @WuZihao1 What are your ideas about the JSON structure for optimized plans? Also how about the interface in Husky? You are supposed to reply to #3!!!
I feel each pull request of adding new rules, new features, etc. may all accompany with its tests. It helps the reviewer to go through the code, and after all, it is a project with commits from multiple people. For rule tests, it can leverage the existing tools in Calcite's rule tests to create new ones.
What are your ideas about the JSON structure for optimized plans?
Just my 2 cents. JSON file may be structured as a list of tasks. Each task contains one or more physical operators. Each physical operator may have its special structure.
If the task is a source which reads data from external data source, then it should record the data source in its JSON fields, such as table name and column names. A task reading data from other tasks should record its parents and the same to tasks writing data to other tasks.
Besides, JSON structure may have an array which lists all the data source tasks' names so that the developer of the Husky program can start visiting each task from these data source tasks.
Just an initial idea if some may feel hard to get started. It heavily depends on the people which are responsible for the Husky side development.
I feel each pull request of adding new rules, new features, etc. may all accompany with its tests. It helps the reviewer to go through the code, and after all, it is a project with commits from multiple people. For rule tests, it can leverage the existing tools in Calcite's rule tests to create new ones.
@Alice3150 @WuZihao1 @Sleepy-Neko @lisy45 Please pay attention. You may learn how to write tests by reading Calcite's tests.
I feel each pull request of adding new rules, new features, etc. may all accompany with its tests. It helps the reviewer to go through the code, and after all, it is a project with commits from multiple people. For rule tests, it can leverage the existing tools in Calcite's rule tests to create new ones.
@Alice3150 @WuZihao1 @Sleepy-Neko @lisy45 Please pay attention. You may learn how to write tests by reading Calcite's tests.
Got it. Thanks.
Normalized Logical Plan:
Apply PushProjectIntoTableScanRule and PushFilterIntoTableScanRule:
Note: In fields, FiledName=[$num] where num is the order of that field in the whole table; In condition, operand($num, constant) where num is the order of that field in the projected fieltes. For example, SELLER_ID is 4th (start with 0) field in the table while it is the 3rd field in the projected fields.
To Do List:
[ ] Add remaining optimization rules like: HuskyLogicalJoin, HuskyLogicalSort, etc.
[ ] Think about how to convert logical plan into physical plan
Here is a proposed example:
{ "name": "HuskyLogicalCalc", "type": "Calc", "project": [ { "index": "0", "name": "TRANS_ID", "datatype": "int" }, { "index": "1", "name": "ITEM_ID", "datatype": "int" }, ... ], "condition":{ { "operator": "AND", "left":{ "operator": "OR", "left": "PRICE", "right": "2.0" }, "right": { ... } } "input": { "name": "HuskyLogicalTableScan", "type": "TableScan", ... } } }