Closed jonssonchristian closed 1 year ago
Upon looking at it a bit more, I see that the current draft of IEC 61400-15-2 uses “analysis” and “assessment” interchangeably (wind analysis/assessment and energy analysis/assessment). My suggestion would be that we adopt wind assessment and energy assessment for the purpose of the EYA DEF, as I feel that is more accurate and more widely used terminology.
I think we can use “wind assessment” in preference over “wind resource assessment” as the former is broader and more concise. Wind assessment seems more inclusive of aspects that affect turbine performance but might not be thought part of the wind resource, such as wind turbulence and vertical shear conditions.
Do you agree?
Yes, that sounds good to me
Hi,
I am good with using 'assessment' over 'analysis'.
With regard 'wind assessment' versus 'wind resource assessment' I am not sure of your reasoning. You want to use 'wind assessment' because it can cover 'site conditions'? If that is the case could an option be to have a 'site conditions' section and a 'wind resource assessment' section? The IEC themselves have split site conditions into 61400-15-1 for site suitability and 61400-15-2 for WRA, EYA and uncertainties. See below screenshot from the SharePoint home page.
Hi @stephenholleran, I did not intend to cover any of the site wind conditions that are purely relevant to a loads assessment. It is a good point that is covered by IEC 61400-15-1 and that we should not duplicate anything from there. What I had in mind is that the wind resource is often taken to mean the distribution of wind speed. I thought 'wind assessment' can take a broader meaning, also including other wind characteristics that affect the energy yield assessment, such as turbulence and vertical wind shear. These may be inputs into the spatial, wake or plant performance models. But I think we could equally use the term 'wind resource assessment' in this broader meaning.
Ah, I understand better now. I would still lean towards sticking with wind resource assessment as to me they are covered as part of this. For me wind resource assessment is commonly used along with WRA.
Just to note I still have to read your full opening message.
That makes sense. I agree it is probably better to stick to the widely used term "wind resource assessment" (with the abbreviation WRA) and just clarify this includes all site wind conditions that affect the energy yield assessment.
Hi @christianjnaturalpower,
I think it makes sense to have an energy assessment report (header) at the top level, being the equivalent of a report document (e.g. PDF). This is a natural container for everything else. An alternative would be to have a list of scenarios at the top level, with each scenario referencing a set of report details, but that is less analogous to a report as we are used to see it.
Agree.
For the measurement campaign information, I think we are in agreement to simply reference the iea43_wra_data_model. This schema supports including several measurement campaigns in one json document, but I think we should support providing a list of json documents according to this schema. The EYA report organisation is not the originator of this data, and it may be that the input data comes in the form of several schema compliant, in which case it should make sense to keep it that way rather than merging them into one. I would suggest that the inclusion of a iea43_wra_data_model json document is optional (informative rather than normative in IEC terms, if I got that right), because there may be instances where a iea43_wra_data_model is not available. We may want to consider an alternative minimal normative schema of measurements metadata, including critical information like the measurement locations.
Agree to use the WRA Data Model. I don't think we would need to create an alternative minimal schema. We can use the WRA Data Model as that has minimal requirements to be valid. For the measurement location it requires Lat, Long and Measurement Type (mast, lidar, sodar, etc.). It would then need at least 1 measurement point so that could be the top wind speed measurement which I would say is useful to have to tell the user what the highest measurement was. That would be it, so already quite minimal.
By a wind analysis, I consider all elements from processing the raw measurements to predicting the wind conditions at the turbine locations, i.e. everything before translating the wind conditions into energy production. This is consistent with the IEC 61400-15-2 draft as far as I can see. The estimates of the wind resource at the measurement locations will be a subset of the wind analysis.
I think this is a fundamental thing to define as mentioned on the last call. Thinking about it a bit further, an option could be to split it up into
I think we are already aligned on breaking out the turbine model specifications (power curves) at the top level, as this comes from a different originator and may be repeated many times across the scenarios. We will want to adopt that part of the schema to whatever becomes the new industry standard, so it makes sense to keep it flexible for now.
Agree.
I propose we keep the turbine location map (wind farm layouts) at the top level rather than under each scenario, both because they have a different originator (typically the project developer) and because different scenarios will often share the same layout (and so we can avoid repetition.
What would this turbine location map contain? My impression it would be an id
, lat
and long
and that is it? The wind farms area or scenarios would then reference each turbine id. I know this avoid turbine lat, long duplication but I think it might add more complication e.g. if someone wants to plot all the turbine layouts in GIS. There is a lot of processing to do to get that data in a format for GIS.
I suggest we use the concept of "turbine location map" rather than "wind farm layout". The map links each unique turbine ID to a unique location (easting and northing). A wind farm layout would contain turbine IDs together with location data for a specific scenario. What I intend with the map is to include the location data for all turbines in all layouts, including neighbouring wind farms. The scenario data then specifies what turbines are included for each wind farm and the turbine location map allows the locations to be looked up (in a relational database model we have turbine location ID as a unique ID in the turbine locations table and a foreign key in the turbine list under each scenario). I see the following advantages with this approach.
- We can avoid repeating turbine locations many times (which would often happen if we kept them in the tree under each of the different scenarios).
- It makes it easier to enforce unique turbine location IDs, since we only have a single map with all IDs and locations. Let me illustrate with an example. Take a project with three turbines, which has two different layout versions. There may be an inclination to give the turbine location IDs WTG1, WTG2 and WTG3 in both versions, i.e. having different locations for the same turbine ID. A single map would enforce these to be distinct (e.g. WTG1-r01, WTG2-r01 and WTG3-r01; and WTG1-r02, WTG2-r02 and WTG3-r02). The IDs could of course also be UUIDs with label information in different fields.
Expanding on this example further (if I understand correctly), say if a scenario is only using 2 turbines, the first and last. Their labels would be WTG1 and WTG3 which doesn't make sense for a standalone scenario. Users would be wondering where is the 2nd turbine gone. The labels (which are different to an id) should be WTG1 and WTG2 or whatever the user wants to use.
- The map could more easily support some layout versioning information, which could be particularly useful for project developers who adopt this data model, needing to manage a lot of iterations of locations during the course of the development process. In addition to a unique turbine location ID, we could include fields for turbine label and layout version.
- It makes it easier to get an overview of all of the different unique scenarios considered in an EYA report (in different data models this could of course also be achieved automatically when parsing and processing the full tree, but I think it may be helpful to have this readily available in the raw json document).
Do you by any chance have an entity relational diagram to explain all this? I think I am not understanding you fully.
In my understanding we have already agreed to keep a list of scenarios at the next level, to be equivalent of the different scenarios that may be presented in a single report. The scenario object contains the EYA results and all the unique information to separate it from other scenarios.
Agree.
Hi @christianjnaturalpower, going back to your proposed top-level data model (sorry for the slow reply), I have the following comments:
Hi @thomasvandelft,
Thanks for your input.
Regarding the possibility of splitting the wind resource; my thought would be that the long term adjustment and vertical extrapolation (if not done in a flow model) could be part of the wind resource assessment part. These are usually common across all scenarios as they don't take into account turbine layouts. The horizontal extrapolation (and vertical extrapolation if done in the flow modelling) could be independent (and yes part of the scenarios) as the horizontal extrapolation is dependent on the turbine layouts.
Hi @stephenholleran and @thomasvandelft, thank you very much for your input.
Considering the question of where we locate the turbine location information, I see two possibilities, as described in the following.
Fundamentally I think both of these will work. The question is what makes the most usable and comprehendible data model. The first has the advantage of a more concise representation of the data but the disadvantage of more abstraction. The second has the advantage of simplicity at the cost of potentially repeating more data. I think the best choice will depend on some of the other decisions regarding the data model structure (see below).
It would be relatively easy to convert between these two models. So if we implement the second one in the EYA DEF data model (schema) it will not prevent anyone from using the first in their local system.
Considering the second (and related) question of how we structure the wind resource assessment, I see the whole of it as one logical unit and am not sure it makes sense to separate. We will definitely want to break it down into components, but I am hesitant to put it in different parts of the overall data model.
I think we all agree that the details of the wind measurement campaign should be separate from the wind resource assessment. It is both a clear logical distinction and a distinction in terms of originator (generally the EYA author has not undertaken the wind measurement campaign, though there will of course be exceptions to this). From the principle of originator, I would separate any data processing and filtering from the measurement campaign data object, and instead put that within the wind resource assessment.
I think of a wind resource assessment as the process and results of converting the wind measurement campaign (data and metadata) into a prediction of long-term wind resource across the site that affect the turbine energy production (i.e. everything before converting wind resource into energy production). For the large majority of wind resource assessment reports, there would only be one wind resource assessment. Different results may be used in the energy assessment depending on the turbine locations and hub heights, but it generally comes from the same wind resource assessment.
The wind resource assessment could be thought of as a chain of extrapolation processes following data processing and quality control, in simplified terms something along the lines of the following:
The pattern is generally that, for each step, you add an extrapolation model that either extends or adds dimensions of the results.
When we speak of results for different hub heights, it is only really different points in the vertical space dimension. Similarly the results at different turbine locations are different points in the space dimensions. As long as the same input data and the same models have been applied for different scenarios, I would consider it the same wind resource assessment, just with the results varying with the dimensions of height above ground and location. This would be consistent with what I have suggested for the results data model in a separate issue (please consider that for reference in this context).
If we decide to keep the turbine layout (location) information under the scenarios, we will need to limit the wind resource assessment to the prediction of the long-term wind climate at the measurement locations and put the prediction at the turbine locations under the scenarios. I think that could be a good alternative. It will mean repeating results if scenarios have the same layout and hub heights, but should not be too onerous. I would propose we keep the spatial model details outside of the scenario object since it would in the large majority of cases be the same for all scenarios, and it can easily be put outside and referenced without adding complexity. The wind resource assessment would then be everything until and including the vertical extrapolation and the predicted long-term wind resource at all relevant heights (hub heights) at the measurement locations. That part I see no good reason to put into the scenarios, since it would generally be the same for all scenarios, and could be quite a lot of data to repeat on every scenario.
Another consideration is how we should treat the situation where reference turbine data is used for the energy assessment instead of a wind resource measurement campaign and assessment. My idea was that we define a reference turbine assessment schema as an alternative to the wind resource assessment schema, and you could reference one of them (or a combination) as the basis of an energy assessment. If we put the wind resource assessment at the turbines inside the scenario object, we need to think about how we allow for these two different possibilities. I am sure it is doable, but likely requires careful consideration. I will keep this in mind when I draft a new model based on what we agree here.
Hi @stephenholleran, thanks for your detailed comments. Further to my general comments above, I have responded to your specific points below.
Hi @christianjnaturalpower,
I think it makes sense to have an energy assessment report (header) at the top level, being the equivalent of a report document (e.g. PDF). This is a natural container for everything else. An alternative would be to have a list of scenarios at the top level, with each scenario referencing a set of report details, but that is less analogous to a report as we are used to see it.
Agree.
Great. It seems we all agree on that. It matches the initial data model draft and that might just need some minor modifications and additions.
For the measurement campaign information, I think we are in agreement to simply reference the iea43_wra_data_model. This schema supports including several measurement campaigns in one json document, but I think we should support providing a list of json documents according to this schema. The EYA report organisation is not the originator of this data, and it may be that the input data comes in the form of several schema compliant, in which case it should make sense to keep it that way rather than merging them into one. I would suggest that the inclusion of a iea43_wra_data_model json document is optional (informative rather than normative in IEC terms, if I got that right), because there may be instances where a iea43_wra_data_model is not available. We may want to consider an alternative minimal normative schema of measurements metadata, including critical information like the measurement locations.
Agree to use the WRA Data Model. I don't think we would need to create an alternative minimal schema. We can use the WRA Data Model as that has minimal requirements to be valid. For the measurement location it requires Lat, Long and Measurement Type (mast, lidar, sodar, etc.). It would then need at least 1 measurement point so that could be the top wind speed measurement which I would say is useful to have to tell the user what the highest measurement was. That would be it, so already quite minimal.
That is a good point and I agree with that.
By a wind analysis, I consider all elements from processing the raw measurements to predicting the wind conditions at the turbine locations, i.e. everything before translating the wind conditions into energy production. This is consistent with the IEC 61400-15-2 draft as far as I can see. The estimates of the wind resource at the measurement locations will be a subset of the wind analysis.
I think this is a fundamental thing to define as mentioned on the last call. Thinking about it a bit further, an option could be to split it up into
- wind resource at the measurement locations and
- then flow modelling where the data model captures the results. This would be the wind resource at each turbine location. It can be independent of the type of flow modelling (WAsP, CFD, ...). Needs thought.
I agree this requires thought and careful consideration. Your proposal for splitting it between measurement locations and turbine locations is one good alternative. That is basically equivalent to one of the alternatives I described in my recent comment above.
I think we are already aligned on breaking out the turbine model specifications (power curves) at the top level, as this comes from a different originator and may be repeated many times across the scenarios. We will want to adopt that part of the schema to whatever becomes the new industry standard, so it makes sense to keep it flexible for now.
Agree.
Great. I will incorporate that when I update the data model draft.
I propose we keep the turbine location map (wind farm layouts) at the top level rather than under each scenario, both because they have a different originator (typically the project developer) and because different scenarios will often share the same layout (and so we can avoid repetition.
What would this turbine location map contain? My impression it would be an
id
,lat
andlong
and that is it? The wind farms area or scenarios would then reference each turbine id. I know this avoid turbine lat, long duplication but I think it might add more complication e.g. if someone wants to plot all the turbine layouts in GIS. There is a lot of processing to do to get that data in a format for GIS.
My idea was that it includes a unique ID, location data and optionally some other attributes like labels, version tags, data source, etc. For the location data, I had in mind to use easting and northing together with an EPSG code for the coordinate system used. Using latitude and longitude is more convenient as it does not require any additional coordinate system data, but most calculations are undertaken an easting/northing grid in a local coordinate system, and I imagine many users of the data model will not want to have to convert everything to/from latitude/longitude coordinates. Perhaps we should support both. We discussed last time that some duplication is acceptable to avoid having to run calculations to derive results. This is maybe an instance of that.
You are right that moving the layout out of the scenarios and instead using references to a "turbine location map" adds some complexity. I included some considerations around this in my other recent comment above. I do not think the processing to get individual turbine layouts would be so comlicated, but it would add an additional step. The abstraction might make the data model a little more difficult to comprehend. Each approach has some advantages and disadvantages.
I suggest we use the concept of "turbine location map" rather than "wind farm layout". The map links each unique turbine ID to a unique location (easting and northing). A wind farm layout would contain turbine IDs together with location data for a specific scenario. What I intend with the map is to include the location data for all turbines in all layouts, including neighbouring wind farms. The scenario data then specifies what turbines are included for each wind farm and the turbine location map allows the locations to be looked up (in a relational database model we have turbine location ID as a unique ID in the turbine locations table and a foreign key in the turbine list under each scenario). I see the following advantages with this approach.
- We can avoid repeating turbine locations many times (which would often happen if we kept them in the tree under each of the different scenarios).
- It makes it easier to enforce unique turbine location IDs, since we only have a single map with all IDs and locations. Let me illustrate with an example. Take a project with three turbines, which has two different layout versions. There may be an inclination to give the turbine location IDs WTG1, WTG2 and WTG3 in both versions, i.e. having different locations for the same turbine ID. A single map would enforce these to be distinct (e.g. WTG1-r01, WTG2-r01 and WTG3-r01; and WTG1-r02, WTG2-r02 and WTG3-r02). The IDs could of course also be UUIDs with label information in different fields.
Expanding on this example further (if I understand correctly), say if a scenario is only using 2 turbines, the first and last. Their labels would be WTG1 and WTG3 which doesn't make sense for a standalone scenario. Users would be wondering where is the 2nd turbine gone. The labels (which are different to an id) should be WTG1 and WTG2 or whatever the user wants to use.
That is a good point. We would need to allow for the turbine IDs to correspond to different labels in different scenarios. That might further complicate things (not making anything really complex, but more relations to manage) and is perhaps a reason to adopt a simpler model.
In some countries it is common though that some turbine labels are missing, as in your example, with the developers dropping some locations through the permitting process and the labels of the remaining turbines having to stay the same. That is however not always the case.
- The map could more easily support some layout versioning information, which could be particularly useful for project developers who adopt this data model, needing to manage a lot of iterations of locations during the course of the development process. In addition to a unique turbine location ID, we could include fields for turbine label and layout version.
- It makes it easier to get an overview of all of the different unique scenarios considered in an EYA report (in different data models this could of course also be achieved automatically when parsing and processing the full tree, but I think it may be helpful to have this readily available in the raw json document).
Do you by any chance have an entity relational diagram to explain all this? I think I am not understanding you fully.
I will try to put one together and post in a separate comment.
In my understanding we have already agreed to keep a list of scenarios at the next level, to be equivalent of the different scenarios that may be presented in a single report. The scenario object contains the EYA results and all the unique information to separate it from other scenarios.
Agree.
Great. I will keep this in the draft.
@christianjnaturalpower :
If we decide to keep the turbine layout (location) information under the scenarios, we will need to limit the wind resource assessment to the prediction of the long-term wind climate at the measurement locations and put the prediction at the turbine locations under the scenarios. I think that could be a good alternative. It will mean repeating results if scenarios have the same layout and hub heights, but should not be too onerous. I would propose we keep the spatial model details outside of the scenario object since it would in the large majority of cases be the same for all scenarios, and it can easily be put outside and referenced without adding complexity. The wind resource assessment would then be everything until and including the vertical extrapolation and the predicted long-term wind resource at all relevant heights (hub heights) at the measurement locations. That part I see no good reason to put into the scenarios, since it would generally be the same for all scenarios, and could be quite a lot of data to repeat on every scenario.
OK, this sounds good to me.
I think an important thing to consider is how the information is likely to be generated. The software packages used in the industry will have their own data models. I'm not sure how much consensus there is between software packages (I only have experience with WindFarmer), but if we enforce a data model that is not aligned with the software that is supposed to generate it, that will cause additional burden on the industry to conform. It may be worth reviewing how various software packages split up the analysis steps.
In WindFarmer:Analyst the wind part is split like:
Wind resource (stored inside the measurements structure):
Flow model (stored inside the scenario structure):
@thomasvandelft I have left some comments below.
I think an important thing to consider is how the information is likely to be generated. The software packages used in the industry will have their own data models. I'm not sure how much consensus there is between software packages (I only have experience with WindFarmer), but if we enforce a data model that is not aligned with the software that is supposed to generate it, that will cause additional burden on the industry to conform. It may be worth reviewing how various software packages split up the analysis steps.
I definitely agree this is important. In my experience this is very similar between different software tools, at least the ones I have worked with, including Windographer, WindPRO and in-house tools. I am not sure what data models they all use under the hood, but since the results you get out are broadly equivalent I expect they can readily produce results of that form. Most likely there will always be some differences between the data models used by different software packages. For example, they may put less emphasis on an easily comprehendible model in favour of higher efficiency. Different models may fit different purposes. I would say the key point is that the different software packages should easily be able to export to the EYA DEF model, although possibly using something different in the internals.
In WindFarmer:Analyst the wind part is split like:
Wind resource (stored inside the measurements structure):
- Measurement campaign
I think we are in agreement to separate the measurement campaign out. Let me know if not. WindFarmer:Analyst could just export this part into the measurement campaign object in the EYA DEF, which will essentially be the digital_wra_data_standard. So any software that can support that data model (hopefully all soon!) should easily feed this into the EYA DEF without modification.
- Cleaning
- Reconstructions
- Long-term adjustment
- Shear model (vertical extrapolation)
- Hub-height time series
- Long-term hub-height wind climate
Flow model (stored inside the scenario structure):
- Horizontal extrapolation
The remainder of the parts align with how I see the typical workflow and data structure, and my idea for the EYA DEF data model. As far as I can see, this should all fit with the generalised results model I suggested in the separate issue. For each step there is basically a modelling process to describe and some results that corrects the results from the previous step, extrapolates along a dimension and/or extrapolates in more dimensions. It will maybe be useful to set up an example to illustrate this.
I would group the shear model (vertical extrapolation), hub-height time series and long-term hub-height wind climate together as one modelling step, where we need to describe the model used (not necessarily in detailed), estimated wind shear exponent (along the binning dimensions used and for each measurement location) and the resulting wind speed estimates. I would see the time series and wind climate as different result outputs from the same process (generally, but must of course not be enforced to be identical), where the former is a wind speed data variable as a function of time and the latter a probability variable as a function of wind speed and wind direction (and could also be other binning dimensions).
The only deviation from the above workflow that I can think of is that the vertical extrapolation is sometimes done before the long-term modelling. I think we need to account for that possibility.
@christianjnaturalpower thanks, sounds good
@thomasvandelft, your comments also triggered another couple of ideas:
Hi @stephenholleran, below is my attempt to turn the "turbine location map" into a relational diagram. However, since this does not quite follow a relational database model in the typical sense, it might not provide the clearest illustration.
The idea is that there is a single table that maps turbine IDs to a location (georeferenced easing and northing) and optionally other attributes like labels and layout versions. That unique ID can then be used whenever we want to associate some result with a turbine location. In the example, we have some result that is binned by turbine, year and month, but it could be any dimensions together with turbine. The example results table is basically a flattened version of the multi-dimensional results map (labelled multi-dimensional array), where the dimensions are put as indices. Each scenario has a list of the unique turbine IDs that are included for each wind farm.
It might be that, whilst this "turbine location map" could be a good data model for a complete energy yield software package, it may be overly complicated for the JSON Schema. Users may find it more convenient to have the layout data contained under each scenario. I am starting to tend towards the simpler model, even that has the cost of some duplication.
In the case where we keep the layout information under the scenarios, I envisage we would follow a similar model as in the diagram above in terms of the results, just using the turbine ID under each scenario as the unique key.
One further alternative option could be to move the turbine layouts out of the scenarios and into the main structure, and just reference them in the scenarios. However, I think that would create some complexity in managing references and ensure consistent labels and maps, without so much gain.
A last comment I would add is that energy yield fundamentally belongs to a scenario since it depends on turbine interaction, which will almost always change between scenarios. Ambient (free) wind speed results generally only depend on height and location, so are less tied to scenarios. Potentially we will repeat a lot of wind speed results between scenarios that consider the same turbine locations and hub heights. However, if this makes the data model much easier to work with, I think we can live with that.
On a somewhat separate note but related to graphical representation of the data model, I discovered the erdantic package for automatically generating simple entity relationship diagrams (ERDs) for pydantic data models. That might be useful for generating documentation. I will give this a try once I have updated the draft EYA DEF data model.
@stephenholleran and @thomasvandelft, based on your feedback, our discussions and my further evaluation, I would propose we keep the turbine layout data under the scenarios, meaning that all turbine related results also go under a scenario. It seems this will create the most user-friendly data model, which is easy to comprehend and work with. I also note the following further advantages.
If you agree I will close this issue and update the draft data model on this basis. We can still continue to iterate on it and refine after that, but will maybe be useful to have something concrete as a basis.
In terms of data repetition, I think we can run some checks once we have some examples. As long as the data model is not cluttered with a lot of repetition overall, I think we can live with a bit if it keeps the data model simpler and easier to work with.
Hi @christianjnaturalpower,
I am a bit confused by this paragraph. I'm not sure exactly which way you want to go as the sentence highlighted seems to go against the rest of the paragraph. Specifically the words highlighted.
When I read it first I took it that we would put the turbine layouts under the scenarios which I think works and is a good option. Along with the wind resource assessment part containing everything up to the long-term wind resource at all relevant heights at the measurement locations. @thomasvandelft which way did you read this when you commented on it?
Edit: I just see your most recent post. I think you have answered my question. Thanks.
@christianjnaturalpower
I edited my comment above as you have answered my query.
If you agree I will close this issue and update the draft data model on this basis. We can still continue to iterate on it and refine after that, but will maybe be useful to have something concrete as a basis.
I would suggest producing an ERD (using erdantic, could be good excuse to check it out), just to see it consolidated graphically, and then we can close this issue.
Hi @stephenholleran, it sounds like a good idea to produce an ERD and review that here before closing this issue. A graphical representation should indeed make it clearer. I will prepare that.
In my comment regarding the spatial modelling, I meant that we could keep the results at the turbine locations within the scenarios but the details of the model (e.g. which model, which settings, what input data, etc.) outside.
We can take the opportunity during the meeting today to have a chat about this as well.
Hi @stephenholleran, @thomasvandelft, et al.,
I made some updates to the draft data model based on what we discussed and agreed (and have updated the pull request with that). Please see below a graphical rendering of the top two levels.
Does this seem like a reasonable starting point? I appreciate we will almost certainly need to extend and refine during the course of the further work, but if there is something you can see straight away that I got wrong or we can improve, please let me know.
Note that some fields that should not be optional in the final model are currently set as optional, just to allow my (as of yet) reduced example to pass validation without errors.
Hey @christianjnaturalpower,
Great work.
Just focusing on the very top level, the EnergyAssessmentReport table.
json_doc_id
and document_id
?
receiving_organisation
is the customer/client that paid for the work. As this is a digital exchange format to enable the frictionless exchange of this report there are going to be many "receivers" of the report. Would explicitly saying customer/client be better?wind
in front of the measurement_campaign
section? I'm not sure if it was solar would the contents be any different. The WRA Data Model can handle met mast measurements, lidar, sodar, solar or floating lidar with ADCPs.measurement
in front of the wind_resource_assessment
section?reference_turbine_assessment
to cover the wind speed info coming from neighbouring turbines. Depending on what the standard report is providing I wonder could this be part of the wind_resource_assessment
?The first few points are details and not that important at the moment. The last 3 are more relevant for this issue.
Thanks!
re Measurement Campaign
name
and the Location
of the measurement station. Not sure if we need to duplicate them here in this data model.Hi @stephenholleran and thanks for the feedback.
Hey @christianjnaturalpower,
Great work.
Just focusing on the very top level, the EnergyAssessmentReport table.
- Should this be called Energy Yield Assessment after EYA?
Yes, I think that makes sense. I would suggest EnergyYieldAssessmentReport to make it explicit that the top level represents a report, but we could also just call it EnergyYieldAssessment. What do you think?
What is the difference between
json_doc_id
anddocument_id
?
- Could they be the same thing?
- Is it a "JSON doc" if it is pulled from an API?
- I would be thinking that they are the same thing. The file extension will distinguish. It could be a word, excel, pdf or JSON doc. It is the same report.
The json_doc_id
is the variable name I picked in the python data model for the JSON document unique identifier (the $id
field). It is not explicitly defined in the JSON Schema and the JSON Schema has its own unique identifier field. These are URIs to identify the documents, including the location. They define the exact locations where documents came from. I think that will be useful to track origin and for verifying authenticity. Since you cannot call a variable $id
in python, and id
could cause namespace conflicts, I picked something else. Perhaps json_uri
would be a more suitable field name in the python model. What do you think?
The document_id
field is intended to represent the document ID that is contained within a PDF report. That might be something like ABC/023455/R/003. If the JSON document has a PDF report equivalent, this might be useful to include. What do you think?
It might be natural that the json_doc_id
(or whatever we call it) contains the document_id
(as part of the URI), but it might not have to. The document_id
would generally be related to the document management system of the issuing organisation and the json_doc_id
related to the API design. In any case I agree with you that any format of the report should have the same document ID. The json_doc_id
is more like an address specific to the JSON document.
- I am assuming the
receiving_organisation
is the customer/client that paid for the work. As this is a digital exchange format to enable the frictionless exchange of this report there are going to be many "receivers" of the report. Would explicitly saying customer/client be better?
That is a good point. I was indeed thinking of the receiving_organisation
generally as the customer/client that commissioned the work, but would not have to be. It could also be a developer issuing an internally prepared report to an investor, for example. A report often has a reciever, but not always. Perhaps we should allow for a list of multiple receivers. In case there is no specified receiver, that could be left blank (i.e. it would be an optional field). Does that seem reasonable to you?
- Why do you have the word
wind
in front of themeasurement_campaign
section? I'm not sure if it was solar would the contents be any different. The WRA Data Model can handle met mast measurements, lidar, sodar, solar or floating lidar with ADCPs.
That is a good point. We can remove the word 'wind' and just call it MeasurementCampaign
.
- Why do you have the word
measurement
in front of thewind_resource_assessment
section?
This is to distinguish it from the wind resource assessment at the turbine locations, which is contained under the scenarios (see MeasurementWindResourceAssessment
and TurbineWindResourceAssessment
). We can tune this terminology and should try to align it with the IEC 61400-15-2 document.
- I see you have
reference_turbine_assessment
to cover the wind speed info coming from neighbouring turbines. Depending on what the standard report is providing I wonder could this be part of thewind_resource_assessment
?
Hopefully we can get feedback from Kai and Lars on that. I do however think it will need to cover production data in addition to wind speed data. In my understanding the calculations are generally undertaken in term of production and then tuned to equivalent wind speed results at the end.
The first few points are details and not that important at the moment. The last 3 are more relevant for this issue.
Thanks!
Hi @stephenholleran, see my comments below.
re Measurement Campaign
- The WRA Data Model requires both the
name
and theLocation
of the measurement station. Not sure if we need to duplicate them here in this data model.
My idea was that the name should be identical to the name of the measurement in the IEA Task 43 digital_wra_data_standard JSON document. As I understand it, each IEA Task 43 digital_wra_data_standard JSON document can contain multiple measurement campaigns, so we would need to reference each one individually. Then we would need the name
field for this reference. Do you think that makes sense? Or should we rather aim to add any additionally required metadata to the IEA Task 43 digital_wra_data_standard itself?
I included location as I thought it might be useful to have easing and northing coordinates in the local coordinate system used in the EYA DEF document. As I understand it, the IEA Task 43 digital_wra_data_standard defines latitude and longitude. Of course it is relatively easy to convert, but I am not sure we want to let the user have to do the conversion when needing the location in local coordinates. What do you think?
- Having a reference label that this met mast will be referred to within the EYA DEF would be good e.g. the name in the WRA Data Model might be "Barefoot Met Mast 1 - 80m" whereas an analyst might want to refer to it as "MM1", "BF_MM1" or "BF_80m" for their assessment.
Yes, agreed. Do you think the label
field in the model serves this purpose or do you propose to include something further? Do you think that should be a required field?
The field measurement_id
is included as the unique reference for the measurement. For reference in other parts of the EYA DEF, my idea is to use this unique ID. This could be the same as the label, or something else like a UUID. The labels should also be unique, but it might be more useful and robust to have a separate unique ID that may not be the same as the label, and which is unique in a larger context (many documents). For example, "MM1" would be unique in each document but is likely not when aggregating many documents. What do you think?
- Should this be called Energy Yield Assessment after EYA?
Yes, I think that makes sense. I would suggest EnergyYieldAssessmentReport to make it explicit that the top level represents a report, but we could also just call it EnergyYieldAssessment. What do you think?
I would be inclined to drop the report. I don't see what it would be confused with to need to clarify it.
What is the difference between
json_doc_id
anddocument_id
?
- Could they be the same thing?
- Is it a "JSON doc" if it is pulled from an API?
- I would be thinking that they are the same thing. The file extension will distinguish. It could be a word, excel, pdf or JSON doc. It is the same report.
The
json_doc_id
is the variable name I picked in the python data model for the JSON document unique identifier (the$id
field). It is not explicitly defined in the JSON Schema and the JSON Schema has its own unique identifier field. These are URIs to identify the documents, including the location. They define the exact locations where documents came from. I think that will be useful to track origin and for verifying authenticity. Since you cannot call a variable$id
in python, andid
could cause namespace conflicts, I picked something else. Perhapsjson_uri
would be a more suitable field name in the python model. What do you think?
The $id
is a JSON Schema keyword. This would be the uri of the schema itself.
What you are talking about sounds like an implementation, an actual JSON document, uri? If this is the case then json_uri
would be good as an additional, optional field.
The
document_id
field is intended to represent the document ID that is contained within a PDF report. That might be something like ABC/023455/R/003. If the JSON document has a PDF report equivalent, this might be useful to include. What do you think?
Having the document_id
included is good. The pdf and json should be the same content. This is more important than a json_uri
as a uri might be restricted.
- I am assuming the
receiving_organisation
is the customer/client that paid for the work. As this is a digital exchange format to enable the frictionless exchange of this report there are going to be many "receivers" of the report. Would explicitly saying customer/client be better?That is a good point. I was indeed thinking of the
receiving_organisation
generally as the customer/client that commissioned the work, but would not have to be. It could also be a developer issuing an internally prepared report to an investor, for example. A report often has a reciever, but not always. Perhaps we should allow for a list of multiple receivers. In case there is no specified receiver, that could be left blank (i.e. it would be an optional field). Does that seem reasonable to you?
If it is not the client/customer than I wouldn't have it at all. The org/person producing it often has no idea who it will eventually end up with. Having a receiver could be a confidentiality risk whereas as listing the actual customer/client wouldn't.
- Why do you have the word
measurement
in front of thewind_resource_assessment
section?This is to distinguish it from the wind resource assessment at the turbine locations, which is contained under the scenarios (see
MeasurementWindResourceAssessment
andTurbineWindResourceAssessment
). We can tune this terminology and should try to align it with the IEC 61400-15-2 document.
I think we can leave the higher level one as wind_resource_assessment
and qualify the wind resource at the turbine locations with the word turbine
.
Hi @stephenholleran, see my comments below.
re Measurement Campaign
- The WRA Data Model requires both the
name
and theLocation
of the measurement station. Not sure if we need to duplicate them here in this data model.My idea was that the name ......... ................... many documents. What do you think?
I think we should create a new issue to cover this section and keep this issue for the top level.
EnergyYieldAssessment name
- Should this be called Energy Yield Assessment after EYA?
Yes, I think that makes sense. I would suggest EnergyYieldAssessmentReport to make it explicit that the top level represents a report, but we could also just call it EnergyYieldAssessment. What do you think?
I would be inclined to drop the report. I don't see what it would be confused with to need to clarify it.
Agreed. Concise is good and I agree the word report is superfluous in the context.
json_doc_id
What is the difference between
json_doc_id
anddocument_id
?
- Could they be the same thing?
- Is it a "JSON doc" if it is pulled from an API?
- I would be thinking that they are the same thing. The file extension will distinguish. It could be a word, excel, pdf or JSON doc. It is the same report.
The
json_doc_id
is the variable name I picked in the python data model for the JSON document unique identifier (the$id
field). It is not explicitly defined in the JSON Schema and the JSON Schema has its own unique identifier field. These are URIs to identify the documents, including the location. They define the exact locations where documents came from. I think that will be useful to track origin and for verifying authenticity. Since you cannot call a variable$id
in python, andid
could cause namespace conflicts, I picked something else. Perhapsjson_uri
would be a more suitable field name in the python model. What do you think?The
$id
is a JSON Schema keyword. This would be the uri of the schema itself. What you are talking about sounds like an implementation, an actual JSON document, uri? If this is the case thenjson_uri
would be good as an additional, optional field.
The $id
keyword can be used both in a JSON Schema document and in a JSON document. The $id
of the JSON Schema would be the $ref
in the JSON document. I agree json_uri
is a suitable name and should be optional.
document_id
The
document_id
field is intended to represent the document ID that is contained within a PDF report. That might be something like ABC/023455/R/003. If the JSON document has a PDF report equivalent, this might be useful to include. What do you think?Having the
document_id
included is good. The pdf and json should be the same content. This is more important than ajson_uri
as a uri might be restricted.
Great. Agreed.
Having the URI might not be enough to access the document. An API token or something like that might also be in place to control access. However, we probably do not need to be concerned so much with that.
receiving_organisation
- I am assuming the
receiving_organisation
is the customer/client that paid for the work. As this is a digital exchange format to enable the frictionless exchange of this report there are going to be many "receivers" of the report. Would explicitly saying customer/client be better?That is a good point. I was indeed thinking of the
receiving_organisation
generally as the customer/client that commissioned the work, but would not have to be. It could also be a developer issuing an internally prepared report to an investor, for example. A report often has a reciever, but not always. Perhaps we should allow for a list of multiple receivers. In case there is no specified receiver, that could be left blank (i.e. it would be an optional field). Does that seem reasonable to you?If it is not the client/customer than I wouldn't have it at all. The org/person producing it often has no idea who it will eventually end up with. Having a receiver could be a confidentiality risk whereas as listing the actual customer/client wouldn't.
I think this might require some further thought and maybe collecting some feedback. I suppose the key here is ownership of the report. The issuing organisation prepares the report for a client who then owns a certain right to the report. If the report is made public and there is no ownership and reliance, then the receiver is irrelevant. In some reliance situations a report may be issued to multiple parties. I was also just thinking that it could be (rarely) that there is more than one issuing organisation. I think it makes sense to move this question to a separate issue.
wind_resource_assessment
- Why do you have the word
measurement
in front of thewind_resource_assessment
section?This is to distinguish it from the wind resource assessment at the turbine locations, which is contained under the scenarios (see
MeasurementWindResourceAssessment
andTurbineWindResourceAssessment
). We can tune this terminology and should try to align it with the IEC 61400-15-2 document.I think we can leave the higher level one as
wind_resource_assessment
and qualify the wind resource at the turbine locations with the wordturbine
.
Agreed - that makes sense.
Hi @stephenholleran, see my comments below.
re Measurement Campaign
- The WRA Data Model requires both the
name
and theLocation
of the measurement station. Not sure if we need to duplicate them here in this data model.My idea was that the name ......... ................... many documents. What do you think?
I think we should create a new issue to cover this section and keep this issue for the top level.
Agreed - that makes sense. This is now here.
Hi @stephenholleran, @thomasvandelft , et al.,
I have updated the data model based on what we discussed and agreed on the call yesterday. You find an updated diagram for the top level below.
As discussed, I now close this issue and we can cover remaining open items around details in new separate issues.
I left the wording as issuing_organisations
and receiving_organisations
as I think that is quite general and should be clear. That seems more flexible to me than using consultant
and customer
. In the descriptions and examples I have made it clear this would typically be a consultant and a customer. If you disagree, please raise a new issue and we can cover it there.
Looks great
As we have discussed, there are multiple options for designing the data model hierarchy at the top levels. It would be very helpful if we could determine and agree what the most useful structure will be, since it would be a breaking change if need to alter it further down the line.
Based on the feedback from @stephenholleran and what we discussed during the meetings, I had a further think about it and would suggest the following. I think this is broadly in line with what @stephenholleran suggested, though with a few variations.
I think it makes sense to have an energy assessment report (header) at the top level, being the equivalent of a report document (e.g. PDF). This is a natural container for everything else. An alternative would be to have a list of scenarios at the top level, with each scenario referencing a set of report details, but that is less analogous to a report as we are used to see it.
For the measurement campaign information, I think we are in agreement to simply reference the iea43_wra_data_model. This schema supports including several measurement campaigns in one json document, but I think we should support providing a list of json documents according to this schema. The EYA report organisation is not the originator of this data, and it may be that the input data comes in the form of several schema compliant, in which case it should make sense to keep it that way rather than merging them into one. I would suggest that the inclusion of a iea43_wra_data_model json document is optional (informative rather than normative in IEC terms, if I got that right), because there may be instances where a iea43_wra_data_model is not available. We may want to consider an alternative minimal normative schema of measurements metadata, including critical information like the measurement locations.
@stephenholleran correctly highlighted that the wind analysis is generally common across all scenarios, and that it therefore does not make sense to repeat the wind analysis data for every scenario (unless it differs). In the large majority of cases, there would only be one wind analysis as a basis for an energy yield assessment. However, I can see some special scenarios where different scenarios would have different wind analyses. For example, there might be two on-site measurements that give significantly different pictures of the wind resource, and one energy yield assessment is prepared based on each individual measurement alone to give an idea of the extreme cases (where one measurement is treated as trustworthy and the other one disregarded completely). In my view the data model does not become more complicated by allowing for multiple wind analyses. We can have a list of wind analyses below the EYA report (which most of the time will only have one item), and then let each EYA scenario reference the wind analysis it used. I would avoid letting each EYA scenario reference multiple wind analyses, as that would make it necessary to include all the information on how the different ones were combined and it would complicate things related to wind analysis uncertainty.
I previously used the term "wind resource assessment" instead of "wind analysis." I now changed to the latter as I see that is the term used in the IEC 61400-15-2 reporting section, and it should make sense to be consistent with that. The example IEC 61400-15-2 report sections include "on-site wind monitoring" and "wind analysis". The former includes the details of the measurements (what in our data model is contained under the list of wind measurement campaigns) together with data processing and measurement uncertainties. To separate by originator I propose we move data processing and measurement uncertainties to the wind analysis.
In the IEC 61400-14-2 draft uses the terms "wind analysis" and "energy analysis." I would prefer "wind assessment" and "energy assessment." I would typically use the term analysis for processes that are mainly concerned with understanding some phenomenon based on data and assessment for processes are a combination of modelling and analysis. Hence I would say measurement data analysis and energy yield assessment. The term assessments fits with the concept of prediction, which is what we are concerned with both in terms of the wind resource and the energy yield. However, I think it makes sense to align the terms with the IEC 61400-15-2 and do not have strong feelings about this. The most important thing is consistency.
It would be an option to further flatten the wind analysis, break it into its different parts at the top level and let the wind analysis object just be a list of references to the individual parts. This would allow avoidance of duplication if only some parts differ whereas others do not. I however think that would complicate the data model more than it would add value. Too many references in the data model can make it more complex and harder to comprehend and work with. In most cases, there will in any case just be one wind analysis.
By a wind analysis, I consider all elements from processing the raw measurements to predicting the wind conditions at the turbine locations, i.e. everything before translating the wind conditions into energy production. This is consistent with the IEC 61400-15-2 draft as far as I can see. The estimates of the wind resource at the measurement locations will be a subset of the wind analysis.
I think we are already aligned on breaking out the turbine model specifications (power curves) at the top level, as this comes from a different originator and may be repeated many times across the scenarios. We will want to adopt that part of the schema to whatever becomes the new industry standard, so it makes sense to keep it flexible for now.
I propose we keep the turbine location map (wind farm layouts) at the top level rather than under each scenario, both because they have a different originator (typically the project developer) and because different scenarios will often share the same layout (and so we can avoid repetition.
I suggest we use the concept of "turbine location map" rather than "wind farm layout". The map links each unique turbine ID to a unique location (easting and northing). A wind farm layout would contain turbine IDs together with location data for a specific scenario. What I intend with the map is to include the location data for all turbines in all layouts, including neighbouring wind farms. The scenario data then specifies what turbines are included for each wind farm and the turbine location map allows the locations to be looked up (in a relational database model we have turbine location ID as a unique ID in the turbine locations table and a foreign key in the turbine list under each scenario). I see the following advantages with this approach.
I would not include hub height information in the turbine location map, since different scenarios would often have the same layout but different hub heights. The mapping of unique turbine ID to hub height can go under each scenario, which may include some repetition across the different scenarios (if the hub heights are the same), but I think it justified to simplify things.
In my understanding we have already agreed to keep a list of scenarios at the next level, to be equivalent of the different scenarios that may be presented in a single report. The scenario object contains the EYA results and all the unique information to separate it from other scenarios.
Because the energy assessment and uncertainty assessment are so interlinked, I do not think it makes sense to separate them in the data model. The results can easily be separated into different results tables. If estimates are treated as probability distributions (which e.g. DNV does), then it makes no sense to look at mean/median values completely separate from the other characteristics of the distributions.
At this point I have not included anything for the reference turbine assessment, which will be an alternative to the wind analysis in case operational data from neighbouring turbines are used instead of wind measurements. I envision this will follow the same model as the wind analysis.
@stephenholleran, @thomasvandelft, @charlie9578, @Dynorat and the rest -- please could you review this critically, propose alternative ideas that could work better, suggest refinements and raise any issues you see with structuring it this way?
Once we have agreed, I can update the data model accordingly. Note that the current data model in the branch of my pull request does not currently completely match with my proposal above.