how disregarded period of data should be set?

MatteoManaRWE commented 1 year ago

when a period of data is disregarded because the data are not reliable. for example data produced during tests done while performing a manintenance. how that is reported?

what about adding a field in "logger_main_config"?
whould be acceptable a period of missing configuration? date from and date of different configuration should correspond to the same date-time or period to disregard could just be holes in the timeline with no configuration available?

stephenholleran commented 1 year ago

Hi @MatteoManaRWE,

Thanks for posting.

To clarify, are you talking about time series data that are not reliable during a maintenance period? If so, this would be dealt with by an analyst when cleaning the data. They would remove that period from the time series and flag it as invalid or something. The data model is concerned with how the met mast is configured. So if there are no configuration changes then there is nothing captured in the data model. At present there is nowhere in the data model that can capture a met mast visit or maintenance. These are more like a log of events or something similar.

I don't fully understand your proposed solutions but I am not sure if I understand the initial problem in the first place. If you could expand, that would be great?

Stephen

abohara commented 1 year ago

@MatteoManaRWE @stephenholleran

I assume these are related to logger files ? For e.g. there is always a few logger files when the tech's were testing or troubleshooting an issue. In such cases, the metadata and data in these files are spurious. E.g. they may briefly set the wind speed height to 0m. If you process these files, you will get false impression that there was wind speed @ 0m ( prior to QC) ?

Is this the use case you are referring to ?

MatteoManaRWE commented 1 year ago

Hi @abohara @stephenholleran you are both right, duting a maintenance or before QC the settings could change many times in a day or some hours, producing spurious data. I'm not sure if we need to log all these changes, I mean in some cases there could be many configuration changes and could be useless to log all of them because the analyst would just delete the period or disregard it.

thinking about this periods I'm not sure if the date_to of a configurations needs to be coincident with the date_from of the next configuration, in case not the spurious period could be periods without configuration at all, else we could have a field that defined the configuration period to be disregarded because is a test period.

abohara commented 1 year ago

@MatteoManaRWE , I see two possible ways:

Excluding logger files pertaining to the maintenance period completely. If its not valid measurement period, then there is no reason to send it. Your suggestion of following the logger start / end period may sort of fall into this approach.
Start a new table, that the reports maintenance events such as follows. I do have think this would report all the data and allow the final user (i.e. analyst ) to make the final decision on what is to be left in/out.

Date_from	Date_to	Event
2020-02-01	2020-02-02	"Maintenance outage"

@stephenholleran @kersting

stephenholleran commented 1 year ago

Hi @abohara,

I think that @MatteoManaRWE suggestion of

thinking about this periods I'm not sure if the date_to of a configurations needs to be coincident with the date_from of the next configuration, in case not the spurious period could be periods without configuration at all

Simply don't have a configuration for this period. This could be done using either the logger_main_config date_from and date_to or the individual logger_measurement_config. A previous date_to doesn't have to match exactly the next date_from.

A tutorial outlining this particular use case would be very helpful @MatteoManaRWE? ;)

cc @kersting

abohara commented 1 year ago

@stephenholleran @MatteoManaRWE

I do see issues with using logger_main_config date_from/to dates as an implicit proxy for describing which measurement periods should be included or excluded. In my experience, all logger files regardless of whether the logger was in "maintenance" or "test" mode are provided. If there was no difference in the logger setting before / after maintenance then, there would be two entries in the logger_main_config with identical records except for the dates. Unless the user is an expert in these nuanced discussions of the data model & what this means, it would require more digging to figure out why certain periods were excluded when the logger files are clearly there ( though would spurious data). Having an explicit table of maintenance or other interventions were done, can avoid this detour by making the reasons more explicit.

This table does not have to be implemented now, but I do see it as possibly beneficial to add to a future roadmap.

MatteoManaRWE commented 1 year ago

Hi @abohara ! I understand your concern. Using the date_from/to to acount for period of missing configuration could be less visible and not clearly understandable. This can become a problem creating complexity when the .json is shared. probably the best is to share the concern and agree what is reasonable for the majority. all the best

cc @stephenholleran @kersting

abohara commented 1 year ago

@kersting @stephenholleran My suggestion for tracking this explicitly:

Event table

measurement_location_id	action	date_from	date_to	description
<`the_meas_loc_uuid`>	`exclude_logger_data`	`2020-01-01`	`2020-01-03`	`Logger files generated and provided that were created during the testing phase. Suggest the logger files generated during this time frame to be excluded`

This is just the most basic structure and welcome feedback from you on tracking this further.

stephenholleran commented 1 year ago

18-May-2023, call with @abohara @kersting

Problem; many logger files with config changes at the start before commissioning and in the middle of the measurement campaign.
The logger_main_config could be used to capture these using the date froms and tos and the notes. However, it isn't as explicit as having an "Event Table" which should be able to inform the wind analyst about why there are missing data.
If we do an "Event Table" it should cover more than just this scenario. It should cover other maintenance events or sensor failures.
Is this becoming a digital model for maintenance reports??? We should have a clear scope for this.
Use cases:
1. How to inform the wind analyst that a maintenance occurred.
2. How to inform a wind analyst why a sensor was replaced/failed.
3. Inform of a decommission.
4. ??
Purpose of "Event Table": To explain to the wind analyst why there are missing files/data or why there is a config change?
1. @abohara: When the field visits happened along with an 'action' e.g. service, decommission, remote change.
Possible scopes:
1. Do nothing and use the logger_main_config date froms, tos and notes.
2. Create a table to capture the field visits only.
3. Create a table(s) that captures the field visits along with another issues that might occur such as notes regarding a sensor failure.
4. Create a whole data model to digitize maintenance reports. A full digital representation of maintenance documentation.
@abohara will have a first cut for an MVP.

IEA-Task-43 / digital_wra_data_standard

how disregarded period of data should be set? #213

18-May-2023, call with @abohara @kersting