Closed jonssonchristian closed 5 months ago
It is considered bad practice in taxonomy/ontology development to introduce completely new terms that are not already widely used in the domain. I think the same applies here and would say input_characteristics
is a bad choice of field name since no one currently uses it regularly and will understand what it means without reading a definition.
I had a further think about this and would suggest that dataset_statistics
is a better section name to cover statistics in relation to the input datasets, such as data availability.
I propose we limit it to the (raw) data availability for now, but that section could easily be extended later to cover different measures of data coverage and quality.
Pull request #57 includes a proposed change to address this issue.
I introduced
input_characteristics.data_availability
under the wind resource assessment object to cover the raw data availability for wind measurment data, reference meteorological data and reference operational wind farm data. Previously this was covered only for operational wind farm datasets.We need to review and discuss the structure and naming conventions for this element. We may want to refine it to be clearer and most useful.
The term
input_characteristics
aims to capture details that characterise the inputs to the assessment, and which are not already captured in the varius metadata elements (measurement station metadata, reference meteorological dataset metadata and reference operational wind farm metadata). The idea is to group different input characteristics together in this object, just like we group results for different quantities into theresults
object. I did not find it easy to come up with a clear and concise term, andinput_characteristics
was a compromise for an initial draft. If anyone can think of a better term, please make a suggestion. The idea is that this object should capture details about the raw input datasets, before the author of the EYA has undertaken any data filtering or other processing. One idea could perhaps be justinputs
, but that might tend to be interpreted like the inputs themselves rather than metadata about the inputs.At the moment the
input_characteristics
group only has a single child itemdata_availability
. I cannot immediately think of other statistics we might add there, but thought it make sense to have a group that can be added to later on in case we decide later we want to enrich it with more detail. Even if it only remains one item, I think it makes more sense to have a group that adds context rather than having data availability datasets directly under the wind resource assessment, since for the results we have a group.The field name for raw data availability is currently just
data_availability
rather thanraw_data_availability
. The reason is that the groupinput_characteristics
is defined to cover only characteristics of the raw input datasets, and so it should be clear from the context that it can be nothing else than raw data availability. I am generally in favour of nesting fields into groups, where parent groups add context to clarify the interpretation of different fields, rather than having long field names with all the context. For example we haveresults.wind_speed
andresults.turbulence_intensity
rather thanwind_speed_results
andturbulence_intensity_results
. Let me know if you have a different view on this.We currently do not capture processed data availability. Such data are of course also relevant to EYA reporting. The question is whether we can come up with clear enough definitions and data structures to make processed data availability data in the EYA DEF useful. My current view is that it seems better to leave this for a later version, when we also expand the EYA DEF to cover the details of the wind resource assessment process. However, I have no strong opinion there and would welcome proposals for how we can incorporate it at this stage in a simple enough form.