antonroman / smart_meter_data_analysis

This repository contains all the code developed to analyze the smart meter data with HTM and LSTM
2 stars 0 forks source link

Verify control bit in S02 files #7

Open antonroman opened 3 years ago

antonroman commented 3 years ago

The field "Bc" refers to Control Byte, it must be lower than 0x80 to consider the data as valid. We must check this to make sure the data is valid. If we discover values equal or higher than 0x80 it can be a valuable input for the next experiment.

gbarreiro commented 3 years ago

I've created a script to check the control byte called detect_because_invalid.py and I've uploaded [the output]()https://github.com/antonroman/smart_meter_data_analysis/blob/master/data_processing/sample_csv_files/invalid_data.csv). There are a lot of rows with its control byte greater or equal to 0x80:

Screen Shot 2021-06-07 at 19 08 59

I will show you this in more detail in our next meeting.

antonroman commented 3 years ago

Ok, then we need to check with Gabriel what to do in this case:

  1. either drop the sample and fill with the same time of the previous day
  2. or assume the value as valid.
antonroman commented 3 years ago

In order to understand the magnitude of the problem, could be possible to plot a histogram showing the number of files which has the following percentages of errors: 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, >10%?

Thanks a lot!

gbarreiro commented 3 years ago

Done:

Screen Shot 2021-06-09 at 14 13 29 Screen Shot 2021-06-09 at 14 13 57
antonroman commented 3 years ago

Great,

so we can safely discard all the series with more than 0.1% of errors. From those files we should check if either the incorrect samples have a realistic value or we need to fix it. For S02 we could follow the approach of filling these values with an average of the previous and posterior samples. Shall I create another issue for this? Do you know if we also have errors in S05 samples? Thanks!

antonroman commented 3 years ago

We should check if the S05 value provided for this records is "0". It would make sense since this is typically caused by an error in the data tx.

gbarreiro commented 3 years ago

The quality byte (Bc) is only available in the S02 records (hourly), not in the S05 (daily). I have counted the unique R1 and R4 values of the invalid_data.csv file and save them in the invalid_data_R1.csv and invalid_data_R4.csv files. As you can see in the screenshot below, the most common value of R1 (and the same with R4) for the records with a Bc higher or equal than 0x80 is 0, but there are many other invalid records with R1 values higher than zero:

Screen Shot 2021-07-08 at 16 56 17