davidcarslaw / openair

Tools for air quality data analysis
https://davidcarslaw.github.io/openair/
GNU General Public License v2.0
303 stars 113 forks source link

Daylight Saving Time check in checkPrep.R #26

Closed jobonaf closed 8 years ago

jobonaf commented 8 years ago

In checkPrep.R, lines 131-142, you check if data timezone has Daylight Saving Time. If so, you convert to UTC/GMT. I'm not sure your check is working always well. It converts to UTC at least the following cases, even if they do not include DST:

To reproduce the error:

days <- as.POSIXct(seq.Date(as.Date("2012-08-01"),
                            as.Date("2012-08-04"),
                            by = "1 days"),
                   tz="Africa/Algiers")
mydata <- data.frame(date=days, value=1:4)
selectByDate(mydata, start = "2012-08-02", end = "2012-08-03")

I would suggest to change line 136 as follows:

if (!zz[3] %in% c("WILDABBR", "   ") & ## strings meaning that no DST occurs
    zz[2] != zz[3]                   & ## check if winter and summer time are really different
    !(zz[2]=="CET" & zz[3]=="WEST"))   ## special case: CET and WEST are the same indeed; this case occur for tz="Africa/Algiers", are there any other similar cases?

Otherwise, data for Central Europe are difficult to manage (I prefer "Africa/Algiers" for Italy-without-DST since "Etc/GMT-1" is quite counterintuitive). I don't know if there is a more elegant way to detect DST.

davidcarslaw commented 8 years ago

Time zones can cause all sorts of trouble - even without daylight saving time. That is why in openair I strongly recommend that users keep with GMT/UTC or a fixed offset from GMT/UTC. So, as you suggest (even if it is counter-intuitive) European Central Time can be represented as a time zone "Etc/GMT-1". In your example I would have constructed it differently - and it should work OK. When you make a date/time in the first place you must set the time zone, so try:

days <- seq(as.POSIXct("2012-08-01", tz="Etc/GMT-1"), as.POSIXct("2012-08-04", tz="Etc/GMT-1"), by = "1 days")

mydata <- data.frame(date=days, value=1:4)
selectByDate(mydata, start = "2012-08-02", end = "2012-08-03")

       date value
3 2012-08-03     3
4 2012-08-04     4

Most of the time, the time zone should not matter in openair and the warnings are harmless. Where it will matter is when you combine different data sets with different time zones. For example, combining air quality data in CET with Hysplit back trajectories that are in GMT.

HTH

ps. I will look at your code and incorporate it if it looks OK - thanks

jobonaf commented 8 years ago

I tried your code, but I still get the warning, and data are converted to UTC/GMT.

days <- seq(as.POSIXct("2012-08-01", tz="Etc/GMT-1"), 
            as.POSIXct("2012-08-04", tz="Etc/GMT-1"),
            by = "1 days")
mydata <- data.frame(date=days, value=1:4)
selectByDate(mydata, start = "2012-08-02", end = "2012-08-03")
                 date value
3 2012-08-02 23:00:00     3
4 2012-08-03 23:00:00     4
Warning message:
In checkPrep(mydata, vars, "default", remove.calm = FALSE, strip.white = FALSE) :
  Detected data with Daylight Saving Time, converting to UTC/GMT

Maybe this issue is machine-dependent, and you can not reproduce it, as you live in UK. That's what I get with

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] openair_1.6     maps_2.3-10     dplyr_0.4.2     lazyeval_0.1.10

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6         cluster_2.0.3       magrittr_1.5       
 [4] lattice_0.20-33     R6_2.1.0            mapdata_2.2-4      
 [7] stringr_1.0.0       plyr_1.8.3          tools_3.2.2        
[10] parallel_3.2.2      grid_3.2.2          nlme_3.1-122       
[13] mgcv_1.8-7          png_0.1-7           latticeExtra_0.6-26
[16] DBI_0.3.1           assertthat_0.1      RJSONIO_1.3-0      
[19] Matrix_1.2-2        RColorBrewer_1.1-2  mapproj_1.2-3      
[22] reshape2_1.4.1      stringi_0.5-5       RgoogleMaps_1.2.0.7
[25] hexbin_1.27.0      
davidcarslaw commented 8 years ago

Interesting. Must be the different platform/tzone - works fine on Mac. What does the following return for zz?

z <- as.POSIXlt(days[1])
zz <- attr(z, "tzone")

Thanks!

jobonaf commented 8 years ago
days <- seq(as.POSIXct("2012-08-01", tz="Etc/GMT-1"), 
            as.POSIXct("2012-08-04", tz="Etc/GMT-1"),
            by = "1 days")
z <- as.POSIXlt(days[1])
zz <- attr(z, "tzone")
zz
[1] "Etc/GMT-1" "GMT-1"     "GMT-1"  

That's why I propose to add to the check zz[2] != zz[3]

davidcarslaw commented 8 years ago

Ah, right. It is a never ending surprise how this is dealt with in different systems! I'll fix this now. Are you OK to install directly from GitHub or do you need the package to install?

jobonaf commented 8 years ago

Great! devtools::install_github("davidcarslaw/openair") is OK Thank you

davidcarslaw commented 8 years ago

Should be fixed if you want to try it. I will have a renewed look at this because it looks like some of these functions in R behave differently compared with what they used to do. Just let me know if you have a problem - and thanks for the good info about it; it really helps.

jobonaf commented 8 years ago

It works fine. I did this test:

days <- as.POSIXct(strptime(seq.Date(as.Date("2012-08-01"),
                                     as.Date("2012-08-04"),
                                     by = "1 days"),
                            format = "%Y-%m-%d"),
                   tz="Africa/Algiers")
mydata <- data.frame(date=days, value=1:4)
selectByDate(mydata, start = "2012-08-02", end = "2012-08-03")
        date value
3 2012-08-03     3
4 2012-08-04     4

(No warning message)

Thank you!