Open hamgamb opened 9 months ago
Hey, thanks for the report.
I can't reproduce with the same R and arrow version though I am on linux so it might be an issue with tzdata on windows... do other conversions work correctly?
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:arrow':
#>
#> duration
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
arrow_table(date = "12JAN2004") %>%
mutate(date = dmy(date)) %>%
collect()
#> # A tibble: 1 × 1
#> date
#> <date>
#> 1 2004-01-12
arrow_table(date = "12JAN2004") %>%
collect() %>%
mutate(date = dmy(date))
#> # A tibble: 1 × 1
#> date
#> <date>
#> 1 2004-01-12
From others I've spoken to, this isn't reproducible on Mac either. When you say other conversions, do you mean same string format, different methods (as below?)
library(arrow)
#> Warning: package 'arrow' was built under R version 4.3.2
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:arrow':
#>
#> duration
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
arrow_table(date = "12JAN2004") |>
mutate(date = dmy(date)) |>
collect()
#> # A tibble: 1 × 1
#> date
#> <date>
#> 1 NA
arrow_table(date = "12JAN2004") |>
mutate(date = as.Date(date, format = "%d%B%Y")) |>
collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! Invalid: Failed to parse string: '12JAN2004' as a scalar of type timestamp[s]
#> Backtrace:
#> ▆
#> 1. ├─dplyr::collect(...)
#> 2. └─arrow:::collect.arrow_dplyr_query(...)
#> 3. └─arrow:::compute.arrow_dplyr_query(x)
#> 4. └─base::tryCatch(...)
#> 5. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 6. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 7. └─value[[3L]](cond)
#> 8. └─arrow:::augment_io_error_msg(e, call, schema = schema())
#> 9. └─rlang::abort(msg, call = call)
arrow_table(date = "12JAN2004") |>
mutate(date = as_date(date, format = "%d%B%Y")) |>
collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! Invalid: Failed to parse string: '12JAN2004' as a scalar of type timestamp[s]
#> Backtrace:
#> ▆
#> 1. ├─dplyr::collect(...)
#> 2. └─arrow:::collect.arrow_dplyr_query(...)
#> 3. └─arrow:::compute.arrow_dplyr_query(x)
#> 4. └─base::tryCatch(...)
#> 5. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 6. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 7. └─value[[3L]](cond)
#> 8. └─arrow:::augment_io_error_msg(e, call, schema = schema())
#> 9. └─rlang::abort(msg, call = call)
Created on 2024-01-23 with reprex v2.0.2
What's happening here is that "12JAN2004" is in the format which lubridate refers to as dBY or dbY (see ?lubridate::parse_date_time
for the full spec). The dmy()
binding in the arrow package is a wrapper around the parse_date_time()
binding.
In our docs for parse_date_time()
, it notes that "parse_date_time(): quiet = FALSE is not supported Available formats are H, I, j, M, S, U, w, W, y, Y, R, T. On Linux and OS X additionally a, A, b, B, Om, p, r are available."
Therefore on Windows, we wouldn't expect that code to work as b and B are not supported. I assume this is due to tzdata on Windows (which will be doing the parsing in the background) as noted by @assignUser.
Therefore, this isn't a bug, but is expected behaviour. We should update our docs to make it easier to find this information though, as it's not immediately obvious to find without knowing that dmy()
calls parse_date_time()
.
Thanks for reporting this @hamgamb!
Created on 2024-01-23 with reprex v2.0.2
Session info
``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16 ucrt) #> os Windows 10 x64 (build 19045) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Australia.utf8 #> ctype English_Australia.utf8 #> tz Australia/Adelaide #> date 2024-01-23 #> pandoc 3.1.1 @ C:/Users/gamb0043/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> arrow * 14.0.0.2 2023-12-02 [1] CRAN (R 4.3.2) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.1) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2) #> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.0.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.5) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.0.2) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.1) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] C:/Users/gamb0043/R #> [2] C:/Program Files/R/R-4.3.1/library #> #> ────────────────────────────────────────────────────────────────────────────── ```Component(s)
R