Open antonroman opened 3 years ago
I've created a script to check the control byte called detect_because_invalid.py and I've uploaded [the output]()https://github.com/antonroman/smart_meter_data_analysis/blob/master/data_processing/sample_csv_files/invalid_data.csv). There are a lot of rows with its control byte greater or equal to 0x80:
I will show you this in more detail in our next meeting.
Ok, then we need to check with Gabriel what to do in this case:
In order to understand the magnitude of the problem, could be possible to plot a histogram showing the number of files which has the following percentages of errors: 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, >10%?
Thanks a lot!
Done:
Great,
so we can safely discard all the series with more than 0.1% of errors. From those files we should check if either the incorrect samples have a realistic value or we need to fix it. For S02 we could follow the approach of filling these values with an average of the previous and posterior samples. Shall I create another issue for this? Do you know if we also have errors in S05 samples? Thanks!
We should check if the S05 value provided for this records is "0". It would make sense since this is typically caused by an error in the data tx.
The quality byte (Bc) is only available in the S02 records (hourly), not in the S05 (daily). I have counted the unique R1 and R4 values of the invalid_data.csv
file and save them in the invalid_data_R1.csv
and invalid_data_R4.csv
files. As you can see in the screenshot below, the most common value of R1 (and the same with R4) for the records with a Bc higher or equal than 0x80 is 0, but there are many other invalid records with R1 values higher than zero:
The field "Bc" refers to Control Byte, it must be lower than 0x80 to consider the data as valid. We must check this to make sure the data is valid. If we discover values equal or higher than 0x80 it can be a valuable input for the next experiment.