Closed mstrimas closed 3 years ago
I'm seeing now in GitHub actions that arrow
v6.0.0 breaks our tests, will investigate.
@mstrimas I saw the issues on arrow too, I have a proposed patch on https://github.com/cboettig/birddb/tree/arrow-devel branch I should open as a PR. It simplifies the parsing logic but would like your feedback before merging
ebird_data_dir()
to avoid that problem. We already unlink the temp file so no need for using temp storage. Nice to catch this edge case now. Great, thanks! I'll take a look at your PR tomorrow, then revisit this PR once yours is merged.
I'll test these changes on the full dataset this morning.
I can't make sense of the macOS error in the GitHub actions check, I'm hoping maybe I'll go away on it's own next time I push 🤞
Re-imported the full August 2021 EBD, both checklists and observations, and everything looks good. Timings for a simple extraction:
library(birddb)
library(dplyr)
obs <- observations()
#> 60 seconds
obs_woothr <- obs %>%
filter(state_code == "US-NY", common_name == "Wood Thrush") %>%
collect()
#> 34 minutes
I think everything looks good here, apart from the macOS failure in the GitHub actions checks, which remains mysterious to me, especially since I'm testing locally on macOS. Any idea what's going on there?
nice!
yeah, definitely a weird error on the Mac test -- you're using the same mechanism to install the nightly
version of arrow? Might try installing the nightly using arrow::install_arrow(TRUE, FALSE, minimal = FALSE)
, to force a source install instead of a binary install? Otherwise yeah sounds like a not-our-problem kinda bug that should go away.
Alternately, is it necessary to test on the nightly version of arrow? Now that 6.0 is released, maybe we should stick with the CRAN test. (really it might make sense to cover both arrow-devel and arrow-release as options in the config matrix...)
Anyway, those CI issues I see as orthogonal, so I'm happy to merge this as is unless you want to take a crack at a quick fix like binary arrow install or CRAN arrow install?
Benjamin Van Doren, one of the Lab of O postdocs, used the package and offered some feedback, which I've tried to incorporate here. Changes are:
README.md
needed updating to reflect the new function naming.import_ebird()
because the temp untarred file needed 100GB+ on his local drive. To address this, I've allow the temp directory to be specified as a parameter inimport_ebird()
. Another approach would be to useBIRDDB_HOME
as the temp directory since that directory has presumably been chosen to have lots of space.