Open MatteoManaRWE opened 1 year ago
Hi @MatteoManaRWE,
Thanks for posting.
To clarify, are you talking about time series data that are not reliable during a maintenance period? If so, this would be dealt with by an analyst when cleaning the data. They would remove that period from the time series and flag it as invalid or something. The data model is concerned with how the met mast is configured. So if there are no configuration changes then there is nothing captured in the data model. At present there is nowhere in the data model that can capture a met mast visit or maintenance. These are more like a log of events or something similar.
I don't fully understand your proposed solutions but I am not sure if I understand the initial problem in the first place. If you could expand, that would be great?
Stephen
@MatteoManaRWE @stephenholleran
I assume these are related to logger files ? For e.g. there is always a few logger files when the tech's were testing or troubleshooting an issue. In such cases, the metadata and data in these files are spurious. E.g. they may briefly set the wind speed height to 0m. If you process these files, you will get false impression that there was wind speed @ 0m ( prior to QC) ?
Is this the use case you are referring to ?
Hi @abohara @stephenholleran you are both right, duting a maintenance or before QC the settings could change many times in a day or some hours, producing spurious data. I'm not sure if we need to log all these changes, I mean in some cases there could be many configuration changes and could be useless to log all of them because the analyst would just delete the period or disregard it.
thinking about this periods I'm not sure if the date_to of a configurations needs to be coincident with the date_from of the next configuration, in case not the spurious period could be periods without configuration at all, else we could have a field that defined the configuration period to be disregarded because is a test period.
@MatteoManaRWE , I see two possible ways:
Date_from | Date_to | Event |
---|---|---|
2020-02-01 | 2020-02-02 | "Maintenance outage" |
@stephenholleran @kersting
Hi @abohara,
I think that @MatteoManaRWE suggestion of
thinking about this periods I'm not sure if the date_to of a configurations needs to be coincident with the date_from of the next configuration, in case not the spurious period could be periods without configuration at all
Simply don't have a configuration for this period. This could be done using either the logger_main_config
date_from and date_to or the individual logger_measurement_config
. A previous date_to doesn't have to match exactly the next date_from.
A tutorial outlining this particular use case would be very helpful @MatteoManaRWE? ;)
cc @kersting
@stephenholleran @MatteoManaRWE
I do see issues with using logger_main_config
date_from/to dates as an implicit proxy for describing which measurement periods should be included or excluded. In my experience, all logger files regardless of whether the logger was in "maintenance" or "test" mode are provided. If there was no difference in the logger setting before / after maintenance then, there would be two entries in the logger_main_config
with identical records except for the dates. Unless the user is an expert in these nuanced discussions of the data model & what this means, it would require more digging to figure out why certain periods were excluded when the logger files are clearly there ( though would spurious data). Having an explicit table of maintenance or other interventions were done, can avoid this detour by making the reasons more explicit.
This table does not have to be implemented now, but I do see it as possibly beneficial to add to a future roadmap.
Hi @abohara ! I understand your concern. Using the date_from/to to acount for period of missing configuration could be less visible and not clearly understandable. This can become a problem creating complexity when the .json is shared. probably the best is to share the concern and agree what is reasonable for the majority. all the best
cc @stephenholleran @kersting
@kersting @stephenholleran My suggestion for tracking this explicitly:
Event table
measurement_location_id | action | date_from | date_to | description |
---|---|---|---|---|
<the_meas_loc_uuid > |
exclude_logger_data |
2020-01-01 |
2020-01-03 |
Logger files generated and provided that were created during the testing phase. Suggest the logger files generated during this time frame to be excluded |
This is just the most basic structure and welcome feedback from you on tracking this further.
logger_main_config
could be used to capture these using the date froms and tos and the notes. However, it isn't as explicit as having an "Event Table" which should be able to inform the wind analyst about why there are missing data.logger_main_config
date froms, tos and notes.
when a period of data is disregarded because the data are not reliable. for example data produced during tests done while performing a manintenance. how that is reported?
what about adding a field in "logger_main_config"?
whould be acceptable a period of missing configuration? date from and date of different configuration should correspond to the same date-time or period to disregard could just be holes in the timeline with no configuration available?