apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.57k stars 3.54k forks source link

[R] Package Arrow 11.0.0 for R/CRAN #33702

Closed paleolimbot closed 1 year ago

paleolimbot commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

Packaging checklist for CRAN release

For a high-level overview of the release process see the Apache Arrow Release Management Guide.

Before the release candidate is cut:

Wait for the release candidate to be cut:

Make pull requests into the autobrew and rtools-packages repositories used by the configure script on MacOS and Windows. These pull requests will use the release candidate as the source.

Prepare and check the .tar.gz that will be released to CRAN.

Wait for the official release...

Create new autobrew and r-windows PRs such that they use the release instead of the release candidate; ensure linux binary packages are available:

Check binary Arrow C++ distributions specific to the R package:

Submit!

Wait for CRAN...

Component(s)

R

paleolimbot commented 1 year ago

@thisisnic @nealrichardson

paleolimbot commented 1 year ago

Two nightly builds are failing: test-r-linux-valgrind (#33094) and test-r-rhub-debian-gcc-devel-lto-latest (#33701). The valgrind failures won't be solved for this release; however, I've run them locally with archery to make sure they are the same as they were before the last release (and they appear to be). I haven't been able to replicate those errors outside of that docker image and we haven't had any valgrind issues reported by CRAN. I'm working to see if the LTO error is transient or cache-related.

paleolimbot commented 1 year ago

The current CRAN check status indicates a few issues. I checked to make sure the package would compile under gcc-13 (unreleased, requires building gcc from source) and I wasn't able to get any build errors (#33635). The warning for suppressed diagnostics may be a problem; however, we can't solve it without actively attempting to circumvent a CRAN policy which would be bad (I think): #33637.

paleolimbot commented 1 year ago

README/urlchecker fix is (#33705 / #33706)

paleolimbot commented 1 year ago

The link-time optimization nightly failure is probably related to our CI config...I'll continue to investigate and if there's a fix that's needed for CRAN we can pick it into the packaging branch.

nealrichardson commented 1 year ago

gcc13 no longer shows up on https://cran.r-project.org/web/checks/check_results_arrow.html so we're good there. Just have to deal with clang 16 now :/

paleolimbot commented 1 year ago

Is there an Arrow issue for clang16 yet? I didn't get to it today but I'm sure I can get a docker image together on Monday with clang16 and R.

nealrichardson commented 1 year ago

Is there an Arrow issue for clang16 yet? I didn't get to it today but I'm sure I can get a docker image together on Monday with clang16 and R.

I haven't made one yet, just raised it on Zulip and the ML. I think @jonkeane tried to reproduce yesterday so he may have a dockerfile started.

paleolimbot commented 1 year ago

I made #33819 to track. LLVM provides a Dockerfile for a source build on Debian 10 so we should be able to reproduce.

nealrichardson commented 1 year ago

Re: maintainer change, CRAN policy says "Explain any change in the maintainer’s email address and if possible send confirmation from the previous address (by a separate email to CRAN-submissions@R-project.org) or explain why it is not possible."

I will do this at the same time as the submission.

paleolimbot commented 1 year ago

Potential list of things we need to pick into the CRAN-specific release branch:

Neal mentioned removing the badges from the readme (but that's already on the checklist πŸ™‚ )

thisisnic commented 1 year ago

@paleolimbot As discussed separately, mind just clarifying what we need to do to proceed here?

paleolimbot commented 1 year ago

The "just update boost" fix turned out to be more complicated than I expected and is a poor candidate for this point in the release cycle. The macOS install timeout thing is only an issue if when you run crossbow submit --group r you see that job fail (and re-running doesn't fix); the Boost fix is only an issue if our CRAN submission gets rejected.

That leaves the LTO fix. I think the best way to proceed would be to pick the LTO fix, remove the badges, and carry on until we have a reason to pick anything else.

thisisnic commented 1 year ago

@paleolimbot Would you mind running the revdepchecks? I remember last time I spent ages and didn't even get it done, due to some weird quirk in the difference between our respective setups, and we did discuss moving it to a Crossbow job at some point but haven't yet.

thisisnic commented 1 year ago

Builth the package locally and I got the output below, so nothing unexpected there :

── R CMD check results ──────────────────────────────────────────────────── arrow 11.0.0 ────
Duration: 7m 20.2s

❯ checking top-level files ... WARNING
  A complete check needs the 'checkbashisms' script.
  See section β€˜Configure and cleanup’ in the β€˜Writing R Extensions’
  manual.

❯ checking compilation flags used ... WARNING
  Compilation used the following non-portable flag(s):
    β€˜-Wno-noexcept-type’ β€˜-msse4.2’
  including flag(s) suppressing warnings

❯ checking installed package size ... NOTE
    installed size is 84.3Mb
    sub-directories of 1Mb or more:
      R      4.2Mb
      libs  79.5Mb
paleolimbot commented 1 year ago

The first one you can probably solve by doing apt-get install checkbashisms (or brew install checkbashisms, or maybe something else if you're not on ubuntu or MacOS).

The second one is showing up because you have ARROW_R_DEV=true...you can maybe unset it to get a more realistic check of what CRAN will see.

The third one always shows up for us...we just have a really big shared library.

thisisnic commented 1 year ago

Yep, re-ran after doing exactly that and the warnings disappeared.

nealrichardson commented 1 year ago

The "just update boost" fix turned out to be more complicated than I expected

How so? I don't see problems mentioned on https://github.com/apache/arrow/pull/33890

Up to y'all (ultimately Nic, since they'll be the one receiving the CRAN emails) if you want to submit without the boost upgrade. std::unary_function, what BDR suggested was the problem for some, only appears in one place I can find in the boost we use, and it is conditional on a check for its existence: https://github.com/boostorg/container_hash/blob/171c012d4723c5e93cc7cffe42919afdf8b27dfa/include/boost/container_hash/hash.hpp#L131

It's reasonable to submit without the upgrade, and if it still fails, respond that we ran on latest clang16 before submission and it passed so we believed the issue to be resolved. Downside is that I doubt we would get any extra time to fix, so I think it will still need to be resolved by Feb 16.

paleolimbot commented 1 year ago

See https://github.com/apache/arrow/pull/33890#issuecomment-1408793021 ...something is missing in the bundle I created via the trim-boost.sh script. If you have a workflow for debugging what it might be I'd love to know before I have to invent one!

thisisnic commented 1 year ago

That fix needs completing and merging at some point anyway, and if it causes us to fail CRAN checks we'd need to submit it again with the change made, so it seems like the only real consequence of waiting for it to be done ahead of the CRAN release (given we're not 100% sure it'll cause a failure) is delaying the initial submission and adding more to @paleolimbot's to-do list in the short-term? If that's the case, I'm happy to submit to CRAN without it, if only so that we know sooner whether it matters or not. If any of my assumptions above don't hold though, let's reconsider.

thisisnic commented 1 year ago

On a different note, we still have the issue of the URL check failing as there is a URL in our docs which currently exists on the dev docs but not the live docs yet. Is there anything to do about this other than wait for the arrow-site update before submitting to CRAN?

nealrichardson commented 1 year ago

Which link? Is it something that should be a relative URL? Otherwise sure, we can wait for the 11.0.0 docs to be published.

nealrichardson commented 1 year ago

FTR the docs update PR is https://github.com/apache/arrow-site/pull/308

thisisnic commented 1 year ago

Thanks for linking to that, I didn't realise that that had just been done.

Broken link is one of the new vignettes after the big docs refactor.

> urlchecker::url_check()
βœ– Error: README.md:83:101 404: Not Found
need to install nightly builds, but if you do please see the article on [installing nightly builds](https://arrow.apache.org/docs/r/articles/install_nightly.html) for more information.
thisisnic commented 1 year ago

Oh, I see, a relative URL would fix that.

thisisnic commented 1 year ago

Actually...would it? It's in the README

assignUser commented 1 year ago

I think that is only 404'ing because the website is not updated for 11.0.0 yet: https://arrow.apache.org/docs/dev/r/articles/install_nightly.html

nealrichardson commented 1 year ago

pkgdown would handle the relative URL, but it would be a broken link in the GitHub web version, so yeah, let's merge the arrow-site PR first.

nealrichardson commented 1 year ago

Looks like the docs PR has merged.

thisisnic commented 1 year ago

Great; I'm just waiting for the crossbow builds to run now + Dewey is doing revdepchecks, so after that it'll just be the Linux binary to upload + winbuilder/macbuilder checks and then off to CRAN.

nealrichardson commented 1 year ago

πŸ‘ sounds good, I'll email about the maintainer change now.

thisisnic commented 1 year ago

The macOS install timeout thing is only an issue if when you run crossbow submit --group r you see that job fail (and re-running doesn't fix)

It did fail; re-running now :crossed_fingers:

paleolimbot commented 1 year ago

Reverse dependencies are clear, both via archery and via my hacked-together local thing. Output from the hacked together local thing below...we have 4 more revdeps since the last version!

# Changes # Check Summary ## CRAN ``` # A tibble: 32 x 5 label pkg notes warnings errors 1 CRAN analogsea 2 CRAN arkdb 3 CRAN CDMConnector 4 CRAN ClickHouseHTTP 5 CRAN dataversionr 6 CRAN diffdfs 7 CRAN disk.frame 8 CRAN duckdb 9 CRAN fastverse 10 CRAN gbifdb 11 CRAN MolgenisArmadillo 12 CRAN mrgsim.parallel 13 CRAN nflreadr 14 CRAN noctua 15 CRAN opencpu 16 CRAN parqr 17 CRAN parquetize 18 CRAN pins 19 CRAN plumber 20 CRAN pointblank 21 CRAN RAthena 22 CRAN raveio 23 CRAN receptiviti 24 CRAN rio 25 CRAN sfarrow 26 CRAN sparklyr 27 CRAN starvz 28 CRAN strand 29 CRAN targets 30 CRAN tidyquery 31 CRAN tradestatistics 32 CRAN vetiver ``` ## Local ``` # A tibble: 32 x 5 label pkg notes warnings errors 1 local analogsea 2 local arkdb 3 local CDMConnector 4 local ClickHouseHTTP 5 local dataversionr 6 local diffdfs 7 local disk.frame 8 local duckdb 9 local fastverse 10 local gbifdb 11 local MolgenisArmadillo 12 local mrgsim.parallel 13 local nflreadr 14 local noctua 15 local opencpu 16 local parqr 17 local parquetize 18 local pins 19 local plumber 20 local pointblank 21 local RAthena 22 local raveio 23 local receptiviti 24 local rio 25 local sfarrow 26 local sparklyr 27 local starvz 28 local strand 29 local targets 30 local tidyquery 31 local tradestatistics 32 local vetiver ``` # All notes and errors ## disk.frame ``` -- R CMD check results ----------------------------------- disk.frame 0.7.2 ---- Duration: 51.9s > checking examples ... ERROR Running examples in β€˜disk.frame-Ex.R’ failed The error most likely occurred in: > ### Name: anti_join.disk.frame > ### Title: Performs join/merge for disk.frames > ### Aliases: anti_join.disk.frame full_join.disk.frame > ### inner_join.disk.frame left_join.disk.frame semi_join.disk.frame > > ### ** Examples > > df.df = as.disk.frame(data.frame(x = 1:3, y = 4:6), overwrite = TRUE) > df2.df = as.disk.frame(data.frame(x = 1:2, z = 10:11), overwrite = TRUE) > > anti_joined.df = anti_join(df.df, df2.df) Warning in anti_join.disk.frame(df.df, df2.df) : merge_by_chunk_id = FALSE. This will take significantly longer and the preparations needed are performed eagerly which may lead to poor performance. Consider making y a data.frame or set merge_by_chunk_id = TRUE for better performance. Appending disk.frames: Appending disk.frames: Error in anti_join(.x, .y, by = by, copy = copy, ..., overwrite = overwrite) : `...` must be empty. βœ– Problematic arguments: β€’ ..1 = xch β€’ ..2 = ych β€’ overwrite = overwrite β„Ή Did you forget to name an argument? Calls: anti_join ... resolve.list -> signalConditionsASAP -> signalConditions Execution halted 1 error x | 0 warnings v | 0 notes v ``` ## receptiviti ``` -- R CMD check results ---------------------------------- receptiviti 0.1.3 ---- Duration: 24.4s > checking tests ... See below... -- Test failures ------------------------------------------------- testthat ---- > library(testthat) > library(receptiviti) > > test_check("receptiviti") [ FAIL 1 | WARN 0 | SKIP 2 | PASS 12 ] ══ Skipped tests ═══════════════════════════════════════════════════════════════ β€’ no API key (2) ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error ('test-receptiviti_status.R:4'): failures works ─────────────────────── Error in `curl_fetch_memory(url, handler)`: Could not resolve host: example.com Backtrace: β–† 1. β”œβ”€testthat::expect_identical(...) at test-receptiviti_status.R:4:2 2. β”‚ └─testthat::quasi_label(enquo(object), label, arg = "object") 3. β”‚ └─rlang::eval_bare(expr, quo_get_env(quo)) 4. β”œβ”€utils::capture.output(...) 5. β”‚ └─base::withVisible(...elt(i)) 6. └─receptiviti::receptiviti_status("example.com", key = 123, secret = 123) 7. └─curl::curl_fetch_memory(url, handler) [ FAIL 1 | WARN 0 | SKIP 2 | PASS 12 ] Error: Test failures Execution halted 1 error x | 0 warnings v | 0 notes v ``` ## duckdb ``` -- R CMD check results --------------------------------------- duckdb 0.6.2 ---- Duration: 2m 37.1s > checking installed package size ... NOTE installed size is 32.5Mb sub-directories of 1Mb or more: libs 31.9Mb 0 errors v | 0 warnings v | 1 note x ``` ## fastverse ``` -- R CMD check results ------------------------------------ fastverse 0.3.1 ---- Duration: 31.1s > checking dependencies in R code ... NOTE Namespaces in Imports field not imported from: β€˜collapse’ β€˜data.table’ β€˜kit’ β€˜magrittr’ All declared Imports should be used. 0 errors v | 0 warnings v | 1 note x ``` ## parquetize ``` -- R CMD check results ----------------------------------- parquetize 0.5.1 ---- Duration: 1m 0.5s > checking dependencies in R code ... NOTE Namespace in Imports field not imported from: β€˜dplyr’ All declared Imports should be used. 0 errors v | 0 warnings v | 1 note x ``` ## pointblank ``` -- R CMD check results ---------------------------------- pointblank 0.11.2 ---- Duration: 1m 10.6s > checking data for non-ASCII characters ... NOTE Note: found 1 marked UTF-8 string 0 errors v | 0 warnings v | 1 note x ``` ## sparklyr ``` -- R CMD check results ------------------------------------- sparklyr 1.7.9 ---- Duration: 1m 20.5s > checking installed package size ... NOTE installed size is 6.8Mb sub-directories of 1Mb or more: R 2.1Mb java 3.4Mb 0 errors v | 0 warnings v | 1 note x ``` ## strand ``` -- R CMD check results --------------------------------------- strand 0.2.0 ---- Duration: 2m 57.9s > checking data for non-ASCII characters ... NOTE Note: found 1 marked UTF-8 string 0 errors v | 0 warnings v | 1 note x ```
thisisnic commented 1 year ago

Thanks @paleolimbot, will plough on through the list!

thisisnic commented 1 year ago

Latest:

Once the crossbow checks are done and we're happy with the results, I'll submit the latest version to CRAN.

paleolimbot commented 1 year ago

I ran R CMD check with the clang-16 image and that was all clear, too βœ…

thisisnic commented 1 year ago

CRAN's clang16 build had an outdated version of Boost, which the build ended up using and failing. New branch 11.0.0.2 contains fixes for that. Checks are being done here: https://github.com/apache/arrow/pull/34103

Winbuilder passed. MacBuilder: passed

thisisnic commented 1 year ago

I got a failure with devtools::check_built() but I've been messing around with my setup to try to get a working Arrow C++ build, and this may have interfered with this. Have asked @paleolimbot also to check; if it works on his machine, then I'm happy.

CMake Error at cmake_modules/ThirdpartyToolchain.cmake:2312 (add_library):
  add_library cannot create imported target "xsimd" because another target
  with the same name already exists.
Call Stack (most recent call first):
  CMakeLists.txt:498 (include)
thisisnic commented 1 year ago

Follow-up tasks now completed so closing this.