apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.65k stars 3.55k forks source link

GH-44802: [C++][CI] Migrate to `arrow::Result` based `parquet::arrow::OpenFile() API` in example tutorials #44807

Closed malinjawi closed 3 days ago

malinjawi commented 3 days ago

Rationale for this change

This PR address this issue and updates the example-cpp-tutorial in Crossbow to resolve build failures caused by deprecated APIs, as seen in this CI job. The change migrates the examples to non-deprecated APIs to ensure compatibility with the latest Arrow C++ version.

Updating these APIs is necessary to:

Fix build failures and prevent future issues. Align with the current Arrow C++ API.

What changes are included in this PR?

This PR updates the example-cpp-tutorial to replace deprecated Arrow C++ APIs with the latest supported APIs, resolving build failures in the Crossbow night build.

Are these changes tested?

By running:

$ cd arrow/cpp/examples/tutorial_examples
$ docker compose run --rm tutorial

output:

== Running example project
==

Day:   [
    1,
    12,
    17,
    23,
    28
  ]
Month:   [
    1,
    3,
    5,
    7,
    1
  ]
Year:   [
    1990,
    2000,
    1995,
    2000,
    1995
  ]
Day: int8
Month: int8
Year: int16
----
Day:
  [
    [
      1,
      12,
      17,
      23,
      28
    ],
    [
      6,
      12,
      3,
      30,
      22
    ]
  ]
Month:
  [
    [
      1,
      3,
      5,
      7,
      1
    ],
    [
      5,
      4,
      11,
      3,
      2
    ]
  ]
Year:
  [
    [
      1990,
      2000,
      1995,
      2000,
      1995
    ],
    [
      1980,
      2001,
      1915,
      2020,
      1996
    ]
  ]
Datum kind: Scalar(12891) content type: int64
12891
Datum kind: ChunkedArray([
  [
    75376,
    647,
    2287,
    5671,
    5092
  ]
]) content type: int32
[
  [
    75376,
    647,
    2287,
    5671,
    5092
  ]
]
Datum kind: Scalar(2) content type: int64
2
Found fragment: parquet_dataset/data1.parquet
Partition expression: true
Found fragment: parquet_dataset/data2.parquet
Partition expression: true
a: int64
b: int64
c: int64
----
a:
  [
    [
      0,
      1,
      2,
      3,
      4
    ],
    [
      5,
      6,
      7,
      8,
      9
    ]
  ]
b:
  [
    [
      9,
      8,
      7,
      6,
      5
    ],
    [
      4,
      3,
      2,
      1,
      0
    ]
  ]
c:
  [
    [
      1,
      2,
      1,
      2,
      1
    ],
    [
      2,
      1,
      2,
      1,
      2
    ]
  ]

Are there any user-facing changes?

Yes, the tutorial has been updated to use non-deprecated APIs, which may affect the example code provided to users.

github-actions[bot] commented 3 days ago

:warning: GitHub issue #44802 has been automatically assigned in GitHub to PR creator.

kou commented 3 days ago

@github-actions crossbow submit example-cpp-tutorial

github-actions[bot] commented 3 days ago

Revision: 8acc696edd6d51bec4fbad4849e761b17ebbb2de

Submitted crossbow builds: ursacomputing/crossbow @ actions-08ec77adbe

Task Status
example-cpp-tutorial GitHub Actions
malinjawi commented 3 days ago

@kou Thanks for your feedback and review. I have committed your suggested changes. Please let me know if there are any other suggestions.

conbench-apache-arrow[bot] commented 2 days ago

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit ae497bf11a4078f48b02f53b8dc843e3c0579d76.

There were 132 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.