Closed DavorJ closed 5 years ago
I also just checked the BAOL086X_179793:
INV=896, but they are not shown. This is because for these records, timestamps are missing. Plotting sequentially shows this:
Here also:
All these red dots have no timestamp.
But not always as in the case of BAOL086X_179793: there some black points have no timestamp too!
To be discussed: INV is used with multiple meanings: no timestamp and as suspicious, but always inconsistent.
OK, so status code as it is seems of little value for validating the outlier procedure. We have two options:
this issue demonstrates the need for a controlled vocabulary
I think users also keep separate lists of 'suspicious data': (parts of) timeseries that should not be used for compensation, without being explicitely marked as invalid in the database. Is that correct @mathiaswackenier ?
I think users also keep separate lists of 'suspicious data': (parts of) timeseries that should not be used for compensation, without being explicitely marked as invalid in the database. Is that correct @mathiaswackenier ?
We do keep these lists, but they are not written down. The WATINA-application forces the user to visually check the timeseries and by doing so we can easily detect suspicious data. There is also a second way how we detect suspicious data and that is during the compensation and calibration. Outliers or suspicious data in the timeseries of the barometric sensors will cause mistakes on the timeseries that are easily visually detected.
In short, the lists we have don't exist in hard-copy, but we are aware of which timeseries are unreliable.
I think the conclusion is that DRME_DMST_CDE is not useful for validating the algorithms
We wanted to evaluate the DRME_DMST_CDE field taken from the database.
Here are the results.
The DEL and VLD category seem to reference the duplicates in data:
The "DUPES=..." annotation is calculated based on duplicate timestamps. They match perfectly.
The INV seems to reference wrong data, but is inconsistent. Here some examples: Why not the black point between the red points? And the two first red points do not seem to be outliers, given the others.
Clearly outliers, but not flagged.
What is wrong with these?
Inconsistent.
The second spike = OK, but the first?
Inconsistent.
Inconsistent.
Not flagged...
To be discussed: the value of this field for validation.