Open alamb opened 4 months ago
take
While I was adding this feature, I encountered lots of issues while trying to generate LogicalPlan from substrait https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans
Writing this to list all these issues
Thank you @Lordworms -- indeed it is a great job to find issues when testing
I wonder if there is some way to get the integration test partly working and checked in somewhere (maybe with the failing tests commented out). If you were able to do so, I think we would be able to get many more people to help with fixing the actual gaps
Thank you @Lordworms -- indeed it is a great job to find issues when testing
I wonder if there is some way to get the integration test partly working and checked in somewhere (maybe with the failing tests commented out). If you were able to do so, I think we would be able to get many more people to help with fixing the actual gaps
actually, I have made tpch_1 work just now, I'll draft a PR, and hope to get a review from you, Thanks a lot.
I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173
actually, I have made tpch_1 work just now, I'll draft a PR, and hope to get a review from you, Thanks a lot.
Wow -- nice work!
Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as well!
Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as well!
Hi @richtia , I just find in some confusion in tpch_1_json
in general tpch1, the filter looks like
but in this repo, the actual tpch_1 you use is like
Is there any purpose of changing the expression?
Thanks a lot
Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as well!
Hi @richtia , I just find in some confusion in tpch_1_json
in general tpch1, the filter looks like but in this repo, the actual tpch_1 you use is like
Is there any purpose of changing the expression?
Thanks a lot
Ahh...these TPCH queries were pulled from somewhere else. If there's a mistake in any, they should definitely be updated! Feel free to file an issue or make the change yourself!
Thanks for the catch.
The - interval
part does seem aligned with TPH-H 3.0.1 spec (page 29). The 120
should in theory be a random number betwen 60 and 120 but that's unrelevant to this usage.
@richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans
@richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans
The plans were generated by Isthmus (https://github.com/substrait-io/substrait-java/tree/main/isthmus) with some minor modifications. I think at the time, Isthmus had trouble generating those 3 plans. So I had just left them blank. I'll create an issue in the repo. Feel free to add them. Otherwise I'll try to get to it some time.
@richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans
The plans were generated by Isthmus (https://github.com/substrait-io/substrait-java/tree/main/isthmus) with some minor modifications. I think at the time, Isthmus had trouble generating those 3 plans. So I had just left them blank. I'll create an issue in the repo. Feel free to add them. Otherwise I'll try to get to it some time.
sure, I'll add that later.
Hi, @richtia I just wonder how you get the original substrait plan, since I use duckdb to be the producer and generate the plan, it didn't have some structure like
"local_files": {
"items": [
{
"uri_file": "file://FILENAME_PLACEHOLDER_0",
"parquet": {}
}
]
}
}
I know it is similar to use namedTable here as what duckDB did, but I still want to keep the consistency with previous plan. Thanks a lot.
Is your feature request related to a problem or challenge?
In https://github.com/apache/datafusion/pull/10653, @Blizzara added the beginnings of testing for substrait plans that came from other systems.
@richtia noted https://github.com/apache/datafusion/pull/10653#issuecomment-2136031347
Describe the solution you'd like
I would like to get DataFusion working with that substrait integration test
Describe alternatives you've considered
No response
Additional context
No response