alan-turing-institute / CROP

CROP is a Research Observation Platform
MIT License
24 stars 4 forks source link

Stark energy data offsets to accommodate Daylight Savings Time #117

Open myyong opened 3 years ago

myyong commented 3 years ago

I'm looking at the energy consumption data, I can see that:

  1. For sensor_id=7, energy data records start from midnight 27th April 2020 and last till today.
  2. From 27th April 2020 00:00:30 to 25th Oct 2020 00:00:30, readings are offset forward by one hour.
  3. From 25th Oct 2020 01:00:00 to 28th March 2021 00:00:00, readings are not offset, and match UTC or UTC +00:00 No Daylight Savings
  4. From 28th March 2021 00:00:30 till today, readings are offset again.

As @meljsingh says, the offset is because of the Daylight Savings. I can also see that Stark has changed their approach in these two years.

In 2020, they had 2 readings, at 01:00:00 to accommodate DST. Our web scraping took in the first set, which do not matched in the UTC readings. In 2021, there is no data for 28th March 2021 01:00:00 and 28th March 2021 01:30:00. The readings resume at 28th March 2021 02:00:00.

So there are 4 items that needs doing.

myyong commented 3 years ago

Update: This is no longer relevant as we are re-populating entire table

The first set of readings (when I first saw them) are as follows:

id timestamp electricity_consumption time_created time_updated sensor_id
17714 2020-10-25 01:00:00 29.34 2020-10-25 09:00:55.089717 17
17763 2020-10-25 01:30:00 30.09 2020-10-26 06:01:23.337154 17

I have changed them to take the second set, which matches the UTC readings. The scripts I used were

UPDATE energy_data
SET electricity_consumption = 29.77
where id=17714

UPDATE energy_data
SET electricity_consumption = 29.56
where id=17763

As a result, running

select * from energy_data
WHERE timestamp>='2020-10-25 00:00:00'                                                                                                                                                                              AND sensor_id=17                                                                                                                                                                                                    
ORDER BY timestamp ASC                                                                                                                                                                                              
LIMIT 8;
now returns id timestamp electricity_consumption time_created time_updated sensor_id
17714 2020-10-25 01:00:00 29.77 2020-10-25 09:00:55.089717 17
17763 2020-10-25 01:30:00 29.56 2020-10-26 06:01:23.337154 17
myyong commented 3 years ago

Update: This is no longer relevant as we are re-populating entire table

In 2021, there is no data for 28th March 2021 01:00:00 and 28th March 2021 01:30:00. The readings resume at 28th March 2021 02:00:00.

id timestamp electricity_consumption time_created time_updated sensor_id
17712 2020-10-25 00:00:00 29.31 2020-10-25 09:00:55.089717 17
17713 2020-10-25 00:30:00 29.49 2020-10-25 09:00:55.089717 17
17764 2020-10-25 02:00:00 29.6 2020-10-26 06:01:23.337154 17
myyong commented 3 years ago

Instead of incrementing timestamps, let's create a backup table, and reload the data from the 27/04/2020 00:30 till today.

create table utc_energy_data as (select * from energy_data) with no data;
\d utc_energy_data;
alter table utc_energy_data add column id2 serial primary key;
alter table utc_energy_data drop column id;
alter table utc_energy_data rename column id2 to id;

returns:

Table "public.utc_energy_data" Column Type Collation Nullable Default
id integer
timestamp timestamp without time zone
electricity_consumption double precision
time_created timestamp without time zone
time_updated timestamp without time zone
sensor_id integer

Insert data using \COPY utc_energy_data (timestamp, electricity_consumption) FROM '/Users/myong/Downloads/energy.csv' WITH CSV HEADER;

The format of the timestamps need to be in YYYY-MM-DD hh-mm-ss format. I switched the formats using Excel.

Check data using:

select * from utc_energy_data                                                                                
WHERE timestamp>='2021-05-05 01:00:00'                                                                                
AND timestamp<='2021-05-05 05:00:00'                                                                                  
AND sensor_id=17                                                                                                      
ORDER BY timestamp ASC                                                                                                
LIMIT 4;

shows that this data matches UTC or UTC +00:00:00 NO DST:

timestamp electricity_consumption time_created time_updated sensor_id id
2021-05-05 01:00:00 27.32 2021-05-06 21:00:00 17 17906
2021-05-05 01:30:00 27.25 2021-05-06 21:00:00 17 17907
2021-05-05 02:00:00 27.32 2021-05-06 21:00:00 17 17908
2021-05-05 02:30:00 27.16 2021-05-06 21:00:00 17 17909

Example of matching data from Stark interface here

myyong commented 3 years ago
meljsingh commented 3 years ago
meljsingh commented 3 years ago

Hi @myyong

I don't seem to be able to access data from either iweather or utc_energy_data.

I cannot access it directly for pgAdmin

error message: ERROR: permission denied for table utc_energy_data SQL state: 42501

Or from a direct query using psycopg2

InsufficientPrivilege: permission denied for table iweather

Accessing all other data is fine (i.e. 30Mhz, energy_data, tinytag_data)

myyong commented 3 years ago

Hi @meljsingh

I don't seem to be able to access data from either iweather or utc_energy_data.

I've changed access now, would you try again?

grant select on utc_energy_data to cropreader;
grant select on utc_energy_data to cropreader;

\z utc_energy_data returns

Schema Name Type Access privileges Column privileges Policies
public utc_energy_data table cropdbadmin=arwdDxt/cropdbadmin+ cropreader=r/cropdbadmin

Lessons learnt: \du; returns list of roles:

Role name    |                         Attributes                         |    Member of     

-----------------|------------------------------------------------------------|------------------ azure_pg_admin | Cannot login, Replication | {} azure_superuser | Superuser, Create role, Create DB, Replication, Bypass RLS | {} cropdbadmin | Create role, Create DB, Replication | {azure_pg_admin} cropreader | | {}

meljsingh commented 3 years ago

Working now!

meljsingh commented 3 years ago

Checking that the data from the new table energy_data_utc is now correctly in UTC.

Two figures show that the new data grabbing algorithm currently is correct.

For clockchange in October 2020: image

For clockchange in March 2021 image

myyong commented 3 years ago

Noting down here from Mel:

In general yes please put the sensor_id =16 in, but just to let you know that the names on Stark are actually the wrong way around. So data from sensor_id=16, named Carpenter’s Place in Stark is actually Clapham Common, and vice versa for sensor_id =17.

and

Furthermore, I’ve noticed that even with using the data from the energy_utc table, we’re getting an hour delay between temperature and energy readings, see picture attached. The blue line is the sensor temperature, while the red line is energy readings. The green line aligns with the red line, which makes sense as the green line represents the lights schedule calculated from energy readings. This hour delay affects the accuracy of the temperature forecasts which depend on the lighting schedule.