IEA-Task-43 / digital_wra_data_standard

IEA Task 43: pre-construction energy estimate data standard repository
BSD 3-Clause "New" or "Revised" License
56 stars 15 forks source link

[SCHEMA] Measurement location UUID #232

Closed jonssonchristian closed 4 months ago

jonssonchristian commented 9 months ago

I am part of a working group creating a data model (digital exchange format) for wind EYA data. For measurement station metadata, we use the IEA Wind Task 43 WRA Data Model (for that sub-model, we reference the WRA Data Model JSON Schema from our JSON Schema).

Elsewhere in the EYA Digital Exchange Format (EYA DEF) schema, we need to be able to reference the different measurement stations. For example, long-term wind speed results at the measurement locations need to be "tagged" with a measurement station identifier.

In the IEA Wind Task 43 WRA Data Model, each item under measurement_location has a property name. Currently we use this name field as the unique identifier when referencing specific measurement stations elsewhere.

In your view, is the name field suitable for this purpose? Would it make sense to include guidance in the description of this field, to highlight that it should be selected such as to be unique within the context of a project. I would expect issues with colliding names to be very rare, but suppose it could happen, for example when combining documents from different organisations, who have used the same name for different measurement stations.

Alternatively, could it make sense to include an id field in addition to name, which could be less human readable but less likely to clash? That could for example be a UUID.

For the purpose of having a short measurement station label in graphics and tables, would it make sense to include an optional abbreviation field, as a shorthand for the name.

abohara commented 9 months ago

Hi @christianjnaturalpower .

All tables (including measurement_location) rows generally have a uuid identifier, but a lot of the uuid field descriptor are left out of the schema definition (perhaps considering it just an implementation detail of the storage). In the postgres example, we do have it.

stephenholleran commented 9 months ago

Hi @christianjnaturalpower,

Good timing with your question as we have just had a group meeting where we brought this up.

We can see a potential issue with name conflicts when bringing together neighbouring projects. It would be very rare but could happen. Therefore, we are willing to include a uuid field at that level, it would be optional and follow the uuid format. Thinking about it a bit more we should probably call it measurement_location_uuid instead of just uuid.

We can also improve the description of the measurement_location name to encourage users to use unique names.

Regarding your other question about an abbreviation for the name. We didn't feel this was helpful at this time as it may cause confusion in what the measurement location is actually called. It is really up to the user to name it whatever they want and I suppose we would encourage users to use as short a name as possible. In your context, producers of EYA reports mostly do this already. If it becomes an issue in the future we would definitely revisit.

Do you want to have a go at these changes?

Cheers,

Stephen

jonssonchristian commented 9 months ago

Hi @stephenholleran and @abohara,

Many thanks for the feedback.

@abohara, that is good and useful to know the postgres example already has a uuid field. If we reference the JSON Schema from another JSON Schema, the field would need to be included there as well, since we are working entirely with JSON documents. We will essentially embed one or more WRA Data Model JSON documents inside an EYA DEF JSON document. I see that the additionalProperties field is set to false under each item of measurement_location, so I think if a uuid field was included in the JSON document it would fail validation. I suppose the existing example database implementation creates the uuid values when ingesting data to create relations locally in the database, whereas adding a uuid field to the JSON document would also allow exchanging that data, so that the objects can be references outside the context of that local database.

@stephenholleran, I have added some comments and questions to your comments below.

Good timing with your question as we have just had a group meeting where we brought this up.

Great!

We can see a potential issue with name conflicts when bringing together neighbouring projects. It would be very rare but could happen. Therefore, we are willing to include a uuid field at that level, it would be optional and follow the uuid format. Thinking about it a bit more we should probably call it measurement_location_uuid instead of just uuid.

That sounds good. Do you see any potentially ambiguity if only calling it uuid? Since it is in the context of a measurement location, should it not be obvious that it is the UUID of the measurement location? We had a similar discussion related to the EYA DEF schema. I had originally included lengthier field names that included the name of the object, but after review we reached consensus that we could shorten them as it was obvious from the context (e.g. turbine_id under a turbine object can just be id, since it is obvious it is the ID of the turbine, but we need turbine_model_id under a turbine object to reference a turbine model object).

If the uuid field is optional, we would need to be able to reference a measurement_location either by name or uuid. The primary reference value could be uuid, but we would need to fall back to name if uuid is not included. That slightly complicates the implementation, but I think it is reasonable and the only solution. Including a new field and making it required would make all legacy JSON documents fail validation.

We can also improve the description of the measurement_location name to encourage users to use unique names.

That also sounds good.

Regarding your other question about an abbreviation for the name. We didn't feel this was helpful at this time as it may cause confusion in what the measurement location is actually called. It is really up to the user to name it whatever they want and I suppose we would encourage users to use as short a name as possible. In your context, producers of EYA reports mostly do this already. If it becomes an issue in the future we would definitely revisit.

That sounds reasonable. I agree the name would probably in general be sufficiently short for an abbreviation of it not to be required. And I agree it does not make sense to introduce a new field before there is a clear use case for it.

Do you want to have a go at these changes?

Sure, if you give me feedback on the question of whether we call it just uuid or measurement_location_uuid, I can draft a PR.

I am not familiar with the entire repo, so there may be places that need updating that I am unaware of. I can update the JSON Schema and JSON examples in the first instance.

stephenholleran commented 9 months ago

Hi @christianjnaturalpower ,

Sorry for the slow response.

You wrote:

Sure, if you give me feedback on the question of whether we call it just uuid or measurement_location_uuid, I can draft a PR.

You are right, uuid will be fine. As you say it has a parent so we know which one it is.

jonssonchristian commented 7 months ago

Hi @stephenholleran, now it is my turn to apologise for a delayed response. I finally prepared updates for a PR but was unable to push my branch since I do not seem to have the required access rights (see below). Would you be able to give me permission to push to the repo?

image

stephenholleran commented 7 months ago

Hi @christianjnaturalpower, You shouldn't be able to push to the master branch or even the dev branch. What you should do is branch off the most recent dev branch where you can work on this specific issue. Once you are happy you can then create a pull request to merge that feature branch into the dev branch. I'll be able to review it then and merge when we are happy. After a few changes to the dev branch we will then do a new release where we merge that dev branch into master. I'd hope to do one of those before the end of the year.

Check out the contributing file for more details or ask if you have any questions. https://github.com/IEA-Task-43/digital_wra_data_standard/blob/master/contributing.md

jonssonchristian commented 7 months ago

Hi @stephenholleran, thanks, I had missed that. I was not trying to push to the master or dev branch, but had created my branch for the issue off the master branch rather than the dev branch, so I presume that will be why it failed. I will sort that.

jonssonchristian commented 7 months ago

Hello again @stephenholleran, I merged my issue branch with the latest dev branch but still get the same error message. Are you sure that I do not need some permissions to be able to push a new branch to the repo?

stephenholleran commented 4 months ago

Merged. Thanks @jonssonchristian.