gaborcsardi / parsedate

R package to parse dates given in arbitrary formats
65 stars 7 forks source link

Windows issue bad parse makes following parse bad too #12

Closed trinker closed 8 years ago

trinker commented 8 years ago

I am using R ‘3.2.4’ and parsedate ver ‘1.1.1’. I know this does not replicate on Mac, as the lead data scientist I work with, @data-steve, has the same versions of R and parsedate and cannot replicate. In a clean session I get the following:

x <- c("2013-02-08T09:30:26", "20131-02-08T09:30:26", "2013-02-08T09:30:26")
parsedate::parse_iso_8601(x)

[1] "2013-02-08 09:30:26 UTC" NA                       
[3] NA                       
Warning messages:
1: In as.difftime(as.numeric(x), units = "hours") :
  NAs introduced by coercion
2: In as.difftime(as.numeric(x), units = "mins") :
  NAs introduced by coercion
3: In as.difftime(as.numeric(x), units = "secs") :
  NAs introduced by coercion

I expect element 2 to be NA as I made the year invalid 20131 but this invalid element seems to affect the following element (number 3) which is identical to element 1:

identical(x[1], x[3])
## [1] TRUE

Why is the 3rd element an NA as well on a windows machine?

I also tested on R 3.3.0 and get the same (included sessionInfo below):

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 10586)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.0     parsedate_1.1.1
gaborcsardi commented 8 years ago

Actually, I can replicate this on OSX:

❯ parsedate::parse_iso_8601(x)
[1] "2013-02-08 09:30:26 UTC" NA
[3] NA
Warning messages:
1: In as.difftime(as.numeric(x), units = "hours") :
  NAs introduced by coercion
2: In as.difftime(as.numeric(x), units = "mins") :
  NAs introduced by coercion
3: In as.difftime(as.numeric(x), units = "secs") :
  NAs introduced by coercion

✔ 44.6 MiB master*
❯ devtools::session_info()
Session info -------------------------------------------------------------------
 setting  value
 version  R version 3.2.4 (2016-03-10)
 system   x86_64, darwin13.4.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 tz       Europe/London
 date     2016-05-09

Packages -----------------------------------------------------------------------
 package    * version    date       source
 clisymbols   1.0.0      2015-06-08 CRAN (R 3.2.0)
 crayon       1.3.2      2016-05-06 local
 devtools     1.11.1     2016-04-21 CRAN (R 3.2.5)
 digest       0.6.9      2016-01-08 CRAN (R 3.2.3)
 memoise      1.0.0      2016-01-29 CRAN (R 3.2.3)
 memuse       2.5        2015-07-02 CRAN (R 3.2.0)
 parr         3.3.0      2016-04-16 Github (gaborcsardi/parr@3a2564e)
 parsedate    1.1.1      2014-09-24 CRAN (R 3.2.0)
 prompt       1.0.0      2016-04-16 local (gaborcsardi/prompt@53e0550)
 rstudioapi   0.5        2016-01-24 CRAN (R 3.2.3)
 withr        1.0.1.9000 2016-04-28 Github (jimhester/withr@bd42181)
data-steve commented 8 years ago

Here's a couple examples from my machine

x <- c("2013-02-08T09:30:26", "20131-02-08T09:30:26", "2013-02-08T09:30:26")
parsedate::parse_iso_8601(x)
[1] "2013-02-08 09:30:26 UTC" NA                        "2013-02-08 09:30:26 UTC"

parsedate::parse_iso_8601(c("2016-04-14T09", "20164-04-14T09", "2016-04-13T09", "2016-04-13T09")) 
[1] "2016-04-14 09:00:00 UTC" NA                        "2016-04-13 09:00:00 UTC" "2016-04-13 09:00:00 UTC"

And the session info, in case it helps.

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringi_1.0-1 tidyr_0.4.1   readxl_0.1.0  dplyr_0.4.3  

loaded via a namespace (and not attached):
 [1] lazyeval_0.1.10     magrittr_1.5        R6_2.1.2            assertthat_0.1      parallel_3.2.4      DBI_0.3.1           tools_3.2.4        
 [8] cl_0.0.1            Rcpp_0.12.4         parsedate_1.1.1     medium2jekyll_0.0.1 pacman_0.4.1       ```
gaborcsardi commented 8 years ago

Thanks, was a huge bug as it turned out. I have no idea how it was not triggered on some systems.

trinker commented 8 years ago

Thanks for the fix. Much appreciated.