apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.69k stars 3.56k forks source link

[Docs] Improve Acero Documentation #20284

Open asfimport opened 2 years ago

asfimport commented 2 years ago

From @amol- :

If we want to start promoting Acero to the world, I think we should work on improving a bit the documentation first. Having a blog post that then redirects people to a docs that they find hard to read/apply might actually be counterproductive as it might create a fame of being badly documented. At the moment the only mention of it is https://arrow.apache.org/docs/cpp/streaming_execution.html and it's not very easy to follow (not much explainations, just blocks of code). In comparison if you look at the compute chapter in Python ( https://arrow.apache.org/docs/dev/python/compute.html ) it's much more talkative and explains things as it goes.

Reporter: Will Jones / @wjones127

Related issues:

Note: This issue was originally created as ARROW-16802. Please see the migration documentation for further details.

asfimport commented 2 years ago

Will Jones / @wjones127: I agree with Alessandro we should improve our docs before the blog post. Ideally we do both soon.

asfimport commented 2 years ago

Kexin Su: Hi, I am doing an (academic) project about Acero, however, I find it very hard to start with it, because there is literally only this page talking about Acero  https://arrow.apache.org/docs/cpp/streaming_execution.html.

To understand things I have to look into the code and easily get lost. 

So I am very looking forward to more detailed documentation or design doc, or maybe a specific communication channel for Acero if possible?

To be honest, I am even in trouble with beginning to Debug with Acero, because there are no such CMake Presets or any instructions on how to do that.

Any reply will be highly appreciated! Thanks in advance!

asfimport commented 2 years ago

Will Jones / @wjones127: Hi Kexin,

Most communication about Acero usage happens on the general Arrow user mailing list. You can sign up at: https://arrow.apache.org/community/ We don't have any Acero-specific channels, but maybe that will one day change.

asfimport commented 2 years ago

Vibhatha Lakmal Abeykoon / @vibhatha: @wjones127  thanks for raising this issue. I will work on improving the documentation. 

asfimport commented 2 years ago

Ian Cook / @ianmcook: One important consideration: In the future, we intend for Substrait to be the primary "API language" for Acero. We will discourage direct use of the ExecPlan API and encourage developers to use Substrait plans to tell Acero what operations to execute. So we should probably not invest too much energy in documenting the ExecPlan API.

asfimport commented 2 years ago

Ian Cook / @ianmcook: One easy way to help make Acero docs more visible and accessible is by adding an Acero link in the Subprojects dropdown menu on the Arrow website.

asfimport commented 2 years ago

Weston Pace / @westonpace: [~kexin] in addition to the mailing list there is a Zulip instance at https://ursalabs.zulipchat.com . It has ursalabs in the name (I think we are trying to fix this) but it is open to all.

asfimport commented 2 years ago

Kexin Su: Thanks all your comments. 

Is that possible for you to provide a CMakePreset to debug acero maybe?

I want to debug acero example line by line but failed to do that. 

Thanks in advance !

asfimport commented 2 years ago

Apache Arrow JIRA Bot: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.