Open TatianaJin opened 5 years ago
I am not sure whether I understand it correctly: if JNI is used, isn’t the data being constantly copied from c++ to java and back to execute one function after another?
I am not sure whether I understand it correctly: if JNI is used, isn’t the data being constantly copied from c++ to java and back to execute one function after another?
Not necessarily, by using JNI, we can pass the Relnode tree to c++ and let Husky parse the tree and do execution.
naruhodo! Didn’t expect to transfer Relnode tree
To get the project columns and filter conditions: use HuskyLogicalCal::getProgram()
to get a RexProgram
, and use getCondition()
, getProjectLists()
, and getExprList()
to get the conditions, projections and the expression list.
Able to generate JSON file according to the optimized plan now. Currently the output JSON file will only include information of version,url,projection and condition.
Able to generate JSON file according to the optimized plan now. Currently the output JSON file will only include information of version,url,projection and condition.
Could you specify the json structure or give an example?
@Alice3150 @WuZihao1 @Sleepy-Neko @lisy45 Hi all, I hope you can read all others pr #4 #5 #6 and comment. For this issue, it looks like Siyao's team is more on generating the json and implementing on husky, while Alice is more on the optimization rules. Please have discussions on the IMPORTANT interface: json -> husky and let us decide the design (not the final version, but at least sufficient for this issue) by next week(? if you have any concern about the time, tell me).
As a practice for better understanding the query optimization and execution pipeline, let us start with supporting simple queries like
SELECT * FROM table
, where the table is a file stored in HDFS and to be read by Husky.A rough guideline is as follows.
1. Provide schema and adapter
There are three major parts that you need:
GetRowType
. Hints: you may specify the columns in the model.json through"columns"
. To handle more dynamic cases, you may try using JNI or any method you think appropriate to dynamically load the column types from the file (e.g. stored in ORC, Parquet file header).2. Optimize query and translate optimized plan into json
In the HuskyQueryPlan example we show how the optimization is carried out step by step. You can get the final plan (currently it is a plan in husky logical convention), and translate it into a json file (you can use your creativity to determine how to organize the json).
Alternatively, if you are more comfortable with using JNI to execute Husky tasks, you may use JNI instead of translating a plan into json.
You may try three cases (ordered from easiest to hardest):
SELECT * from table
: you will get a plan with only one RelNode:HuskyLogicalNativeTableScan
. You can think about how to use the info in this RelNode to get the corresponding table (especially the url)SELECT col1, col2 from table
: think about what info is needed to implement the project function in Husky given the table. You may also implement a rule to push project into the table scan (the rule name is given in the Husky rulesets).SELECT col1, col2 from table where col3=val
: you may start with a equality filter in the format ofexpression=literal
. Think about what information is needed and how to get it from the correspondingRelNode
.For each group, either you use JNI or json for codegen, please reply here to specify your solution e.g. if you use json, please specify the structure of your json.
3. Provide execution logic in Husky
Provide your implementation for table scan, project, and filter. If json is used for codegen part, write a main program to read the json and execute the sql logic accordingly.
For each group, please reply here to specify your interface for the operators