Open ianmcook opened 1 year ago
The extension references and anchors are silly
That was partly by design to emphasize that these can be any arbitrary integers. I find it makes it much more obvious how they are related. For example, if we used 0
instead of 7
then it would not be obvious that extension_uri_reference
is a reference to extension_uri_anchor
.
include <arrow/engine/api.h> needs to be changed to #include <arrow/engine/substrait/api.h>
That seems a little surprising to me. What error did you get using <arrow/engine/api.h>
?
The extension references and anchors are silly
That was partly by design to emphasize that these can be any arbitrary integers. I find it makes it much more obvious how they are related. For example, if we used
0
instead of7
then it would not be obvious thatextension_uri_reference
is a reference toextension_uri_anchor
.
Ah, got it, fair enough.
include <arrow/engine/api.h> needs to be changed to #include <arrow/engine/substrait/api.h>
That seems a little surprising to me. What error did you get using
<arrow/engine/api.h>
?
That file does not exist when I build and install the Arrow library (using a recent dev commit) :
% tree /usr/local/include/arrow/engine
/usr/local/include/arrow/engine
└── substrait
├── api.h
├── extension_set.h
├── extension_types.h
├── options.h
├── pch.h
├── relation.h
├── serde.h
├── test_plan_builder.h
├── type_fwd.h
├── util.h
└── visibility.h
The user needs to find/create a Parquet file to run it
To solve this, maybe we could just add a comment saying something like this:
To create a Parquet file example.parquet
to use with this example, run this Python code:
import pyarrow as pa
import pyarrow.parquet as pq
pq.write_table(pa.Table.from_pydict({'i': [1,2,3,4], 'b': [True,False,True,False]}), 'example.parquet')
take
That file does not exist when I build and install the Arrow library (using a recent dev commit) :
Looking into the example after many iterations since the initial PR, I think the most intuitive and helpful example could be to use a data source with a NamedTableProvider
. We already have many test cases wrapped around that.
This means existing code could completely change. What do you think?
cc @westonpace @ianmcook
@vibhatha sounds OK to me, but I defer to Weston
Agreed. Not only is it simpler (don't need to rely on user to create a parquet file) but it demonstrates even more features :)
I will modify accordingly. Thanks for the feedback.
Is there any update on this? What should be the final example showing how we can consume substrait plans with Acero in C++?
Describe the enhancement requested
The example at cpp/examples/arrow/engine_substrait_consumption.cc has some shortcomings that make it less useful than it could be for demonstrating how Acero can consume and execute Substrait plans:
add
function is included in the extensions but not used#include <arrow/engine/api.h>
needs to be changed to#include <arrow/engine/substrait/api.h>
Component(s)
C++