CfSOtago / GREENGridData

Code to process, document data and analyse data from the Renewable Energy and the Smart Grid (GREEN Grid) project.
https://cfsotago.github.io/GREENGridData/
GNU General Public License v3.0
1 stars 5 forks source link

Grid Spy: possible dateTime errors and DST related duplications #18

Closed dataknut closed 5 years ago

dataknut commented 6 years ago

@jkmair notes:

There are duplicated (or missing) observations if you use NZT in the DST break hours. e.g.:

-> For some reason the duplicate entries have a TZ_orig of "date NZ", while all of the non-duplicate entries are "date UTC".

How is (or should) DST be handled?

dataknut commented 6 years ago

yes, this is badly documented. The relevant code and explanations are scattered across R/gridSpy.R and some of the examples. I'll upgrade the documentation.

The brief answer is that some of the data was originally downloaded in NZ time (hence the TZ_orig = date NZ rows) and some in UTC. The data processing code does as good a job as it can in fixing the former to match up with the latter (r_dateTime is always UTC) but there may well be glitches especially as we think there may have been attempts to fix the DST switch hours in some way before we downloaded the data and we either do not understand what was done or it made things worse (or both).

The data processing duplicate check is done on linkID <-> r_dateTime (i.e. UTC) <-> circuit <-> powerW so would not pick up the repeated NZT values which are within the DST hour and may be a side effect of whatever fix was attempted.

I use readr::read_csv to load the .csv files as it auto-parses r_dateTime into NZ time (by auto-setting the tz from current location). You can then use lubridate to force/check tz if you wish. You can also use readr::read_csv to load as char and then do your own conversions. Always force dateTime_orig to load as char as it is a completely unreliable (and sometimes unparseable) date time.

My curent advice to all users is to avoid the DST hours/days entirely…!

I hate timezones but I hate DST even more...

dataknut commented 6 years ago

These two seem relevant :-)

https://speakerdeck.com/jennybc/how-to-name-files?slide=21

https://speakerdeck.com/jennybc/how-to-name-files?slide=22

dataknut commented 6 years ago

Not really fixed but I have updated the documentation to explain what is going on and what users might want to do about it. See:

https://cfsotago.github.io/GREENGridData/gridSpy1mProcessingReport_v1.0.html#getItRightFirstTime

and (especially)

https://cfsotago.github.io/GREENGridData/gridSpy1mProcessingReport_v1.0.html#dateTimeChecks