apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.35k stars 3.49k forks source link

[R] R tests fail with "NotImplemented: construction from scalar of type numeric(0)" #37091

Closed felipecrv closed 1 year ago

felipecrv commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

R tests failing with

Start test: ExtensionType scalar behaviour
  'test-scalar.R:49:3' [success]
  'test-scalar.R:50:3' [success]
  'test-scalar.R:51:3' [success]
/arrow/cpp/src/arrow/result.cc:28: ValueOrDie called on an error: NotImplemented: construction from scalar of type numeric(0)
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util7CerrLog14PrintBackTraceEv+0x39)[0x7fa51a3ea467]
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util7CerrLogD1Ev+0x5f)[0x7fa51a3ea3d7]
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util7CerrLogD0Ev+0x1c)[0x7fa51a3ea3fc]
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util8ArrowLogD1Ev+0x4b)[0x7fa51a3ea20f]
/usr/local/lib/libarrow.so.1300(_ZN5arrow8internal14DieWithMessageERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x5c)[0x7fa51a1390f0]
/usr/local/lib/libarrow.so.1300(_ZN5arrow8internal17InvalidValueOrDieERKNS_6StatusE+0x88)[0x7fa51a1391b4]
/usr/local/lib/libparquet.so.1300(_ZNO5arrow6ResultISt10shared_ptrINS_5ArrayEEE10ValueOrDieEv+0x4b)[0x7fa51dfb5b95]
/usr/local/lib/libarrow.so.1300(_ZNO5arrow6ResultISt10shared_ptrINS_5ArrayEEEdeEv+0x41)[0x7fa51a1a2309]
/usr/local/lib/libarrow.so.1300(_ZNK5arrow6Scalar8ToStringB5cxx11Ev+0x296)[0x7fa51a1446f0]
/arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so(_Z16Scalar__ToStringB5cxx11RKSt10shared_ptrIN5arrow6ScalarEE+0x25)[0x7fa51ff6efa5]
/arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so(_arrow_Scalar__ToString+0x72)[0x7fa51fe4cc12]
/lib/libR.so(+0xfabce)[0x7fa529797bce]
/lib/libR.so(+0x13dde9)[0x7fa5297dade9]

Example: https://github.com/apache/arrow/actions/runs/5803060349/job/15730480958?pr=35345

Component(s)

R

felipecrv commented 1 year ago

@kou @pitrou this could be related to my changes, but seems unrelated.

thisisnic commented 1 year ago

For more context, the body of the failing test (with extraneous tests truncated out) in R is:

ext_array <- vctrs_extension_array(4)
ext_scalar <- scalar(ext_array)
expect_output(print(ext_scalar), "Scalar\n4")

Which converts an extension array to a scalar and then converts that to a string.

thisisnic commented 1 year ago

I'll have a look, but also pinging @paleolimbot who knows more about our extension type implementation than I do

paleolimbot commented 1 year ago

This particular test was added because before #15277 a segfault occurred when attempting to print an ExtensionScalar from R. I forget the details of why the original segfault occurred, but the fix we went with is:

https://github.com/apache/arrow/blob/d3ccc833a61b70a988090cd8065d3e38d7c29a89/r/src/scalar.cpp#L60-L73

@felipecrv are there some PRs I could take a look at to look for changes that may have affected this bit?

thisisnic commented 1 year ago

@paleolimbot #35344

thisisnic commented 1 year ago

And this is where the error is raised:

https://github.com/apache/arrow/blob/1a00fecf7dc1d75bfb5d31b6c5fae4b3d646bf1b/cpp/src/arrow/array/util.cc#L802-L804

paleolimbot commented 1 year ago

Hmm...when I build Arrow off of the PR branch I get

✖ | 5     389 | data-type                                                                           
────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test-data-type.R:292:3): duration types work as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 34
`expected`: 33
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$DURATION) at test-data-type.R:292:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:303:3): duration types work as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 34
`expected`: 33
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$DURATION) at test-data-type.R:303:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:380:3): map type works as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 31
`expected`: 30
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$MAP) at test-data-type.R:380:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:430:3): struct type works as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 27
`expected`: 26
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$STRUCT) at test-data-type.R:430:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:458:3): DictionaryType works as expected (ARROW-3355)
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 30
`expected`: 29
Backtrace:
    ▆
 1. └─arrow:::expect_equal(d$id, Type$DICTIONARY) at test-data-type.R:458:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

...so I wonder if the type enum IDs changed and need to be synced with

https://github.com/apache/arrow/blob/7c8f398f32ba5d2685093a010002f729a01a4242/r/R/enums.R#L44-L82

I'm not exactly sure how the wrong type ID would end up there, but if the wrong type ID was at

https://github.com/apache/arrow/blob/d3ccc833a61b70a988090cd8065d3e38d7c29a89/r/src/scalar.cpp#L60-L63

...it would crash because casting an extension array to utf-8 is not implemented (this was the error that https://github.com/apache/arrow/pull/15277 fixed).

thisisnic commented 1 year ago

If I add LIST_VIEW = 26L to the list of Types in enums.R and bump the other numbers up we no longer get the failures, but I don't know why as I can't see the link between those IDs and the C++ code.

thisisnic commented 1 year ago

@felipecrv - I have a solution for you in the above comment, which should fix things in your PR even if I'm not totally sure why it does.

thisisnic commented 1 year ago

As this isn't a bug, I'm going to close this now

felipecrv commented 1 year ago

@felipecrv - I have a solution for you in the above comment, which should fix things in your PR even if I'm not totally sure why it does.

Oh, I suspected that this could be due to R assuming a specific numbering of the Type enum. If this is the case, we should be explicit about the enum entry values in the C++ enum.

@bkietz, do you know if the automatic numbering of enum entries is fully specified in C++ or is the compiler free to choose anything?

felipecrv commented 1 year ago

@thisisnic @paleolimbot thanks for looking into this and sorry for the false alarm. I opened a PR that adds explicit values on the C++ side as well so no one gets confused about this again.

thisisnic commented 1 year ago

@felipecrv No worries, and thanks for figuring it out for us all, as I was really confused by the reasoning there! :)