apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.63k stars 3.56k forks source link

GH-34698: [C++][Acero] Add node that emits explicit ordering after asserting order #44738

Open EnricoMi opened 1 week ago

EnricoMi commented 1 week ago

Rationale for this change

An acero node that turns an implicit ordering into an explicit ordering (rows sorted by some columns) is useful to re-use order that already exists in the data.

What changes are included in this PR?

This PR adds the AssertOrderNode that implements this logic. The Scanner employs that node to turn the implicit ordering of the ScanNode into an explicit order as defined by user code via ScanBuilder.Ordering.

Are these changes tested?

There are unit tests for the AssertOrderNode as well as for the ScanNode and ScanBuilder.

Are there any user-facing changes?

The following options has been added:

github-actions[bot] commented 1 week ago

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

github-actions[bot] commented 6 days ago

:warning: GitHub issue #34698 has been automatically assigned in GitHub to PR creator.

EnricoMi commented 2 days ago

@raulcd thanks, assertions adjusted