IEA-Task-43 / digital_wra_data_standard

IEA Task 43: pre-construction energy estimate data standard repository
BSD 3-Clause "New" or "Revised" License
58 stars 16 forks source link

Database Data Model version tracking #33

Closed stephenholleran closed 8 months ago

stephenholleran commented 4 years ago

Falling out from the discussion in issue #29 about tagging the version of the schema, specifically @sdsmdp's comment, we should track the Data Model version in a new database table to help with updating existing databases to newer versions.

Resulting from this would be making sure that updates to the Data Model are matched in the create table SQL statements and writing scripts within that to update previous databases to the new schema.

abohara commented 3 years ago

@sdsmdp @stephenholleran

I see the following version information in the JSON example currently. Is this sufficient ? If I understand correctly, X wants to pass their data to Y, and hence generates the JSON and specifies that the JSON is consistent with V0.1 of the WRA schema ? In that case , do we need anything beyond, simply : version: '0.1.0' ?

How do we handle future breaking changes seems to be a separate but important issue as well.

 "version": {
      "type": "string",
      "title": "Version",
      "description": "The version of the IEA WRA Data Model this file is based on. The format of this follows Semantic Versioning with the year and month tagged on after. E.g. 1.1.1-2020.07",
      "examples": [
        "0.1.0-2020.06",
        "1.1.1-2020.07"
      ],
      "pattern": "^([0-9]{1,2})[.]([0-9]{1,2})[.]([0-9]{1,2})-([0-9]{4})[.]([0-9]{2})$"
    }
sdsmdp commented 3 years ago

@abohara , I am confused by your comment. The snippet you quoted is from the JSON schema, and so it specifies and describes the version attribute. In practice, an actual JSON file that implements the schema will have simple name-value pair as you suggest (e.g., version: "0.1.0-2020.06").

Is it the additional identifiers that follow the version number that you question? I believe (but we could test) that any validation tool and version comparators know how to handle (or ignore) such metadata. It will only look at <major>.<minor>.<patch>. But perhaps that is not your concern...?

Regarding breaking changes, I agree it is important. I don't believe solving it is urgent until we propose a change that would be a breaking change. To put it another way, whatever method we come up with may be inadequate until we are in a situation where we are proposing an actual breaking change.

I suggest a set of principles we could follow regarding breaking changes. For example:

  1. Think long and hard before making breaking changes.
  2. Make sure to increment the version number appropriately to indicate a breaking change (e.g., increment the <major> number).
  3. Provide tools and/or guidance as part of the release to help with data handling and data migration. (E.g., for those implementing the schema in their target databased, scripts that they can use/adapt to help with applying the database change and migrating data). Check out the book Refactoring Databases: Evolutionary Database Design to review some helpful patterns and techniques for doing this. I expect we could implement a similar approach for parsing and converting older JSON files that no longer match the target schema.
stephenholleran commented 3 years ago

Hi @abohara @sdsmdp To try and help clear up @sdsmdp confusion, the code snippet:

 "version": {
      "type": "string",
      "title": "Version",
      "description": "The version of the IEA WRA Data Model this file is based on. The format of this follows Semantic Versioning with the year and month tagged on after. E.g. 1.1.1-2020.07",
      "examples": [
        "0.1.0-2020.06",
        "1.1.1-2020.07"
      ],
      "pattern": "^([0-9]{1,2})[.]([0-9]{1,2})[.]([0-9]{1,2})-([0-9]{4})[.]([0-9]{2})$"
    }

Is in the Schema file to describe how the Schema version number is displayed in a json implementation, as in the demo implementation:

{
  "author": "Stephen Holleran",
  "organisation": "brightwind",
  "date": "2020-06-07",
  "version": "0.0.0-2020.07",
  "plant_name": "A Name of Wind Farm",
  "plant_type": "onshore_wind",

So to answer @abohara's question:

If I understand correctly, X wants to pass their data to Y, and hence generates the JSON and specifies that the JSON is consistent with V0.1 of the WRA schema? Is this sufficient?

Yes, I think it is sufficient for Y to know what Schema version was followed to create the json and so will be able to parse it based on that same version.

The problem is how can person Y get that particular version of the Schema? I thought of this earlier last week. I think I should have put in a "$version": "v0.1.0-2021.01 at the top of the Schema. This would be the easy thing and I should have done it.

One of the reasons I didn't consider it is because I thought once the repo is tagged and released with 'v0.1.0-2021.01' we would be able to link to any file in the repo by including the release name. This is possible through git but what I think comes back is the whole repo as a .zip as you can manually download from here: https://github.com/IEA-Task-43/digital_wra_data_standard/releases/tag/v0.1.0-2021.01 This could be unzipped in code and then have the Schema file for this version. A bit of a long way around but is possible.

Another method is that every version of a file on GitHub has a perma link. A permanent link that shows that file at that commit so everyone with that link can see the same thing. Person Y would have to have a linking table between version number and the perma link. Again, possible, but not great. I think including "$version": "v0.1.0-2021.01 in the actual Schema file to identify itself is best?

stephenholleran commented 3 years ago

@sdsmdp

Regarding the rest of your post on principals for updating the Schema and version number I totally agree it would be good to implement. I found months ago a blog about updating JSON Schema's specifically and it looks reasonable. https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/

abohara commented 3 years ago

@stephenholleran @sdsmdp

Sorry for the confusion, but it seems like it is cleared up. I think I mistook the originally issue to mean that the JSON schema did not have any version information. Hence I had that snippet there as evidence that maybe we could clear it.

The first two principals ( from @sdsmdp) seem very reasonable and aligned with the Sem. Ver. approach. I have also found an accompanying 'changelog' documenting 'breaking' changes to also be helpful . I have seen them attached with the tagged releases ( which is how I would trace the versioning information if I was starting out fresh).

Do you both feel we need any thing more detailed at the moment ? If not, how do you suggest documenting this to move ahead ?

sdsmdp commented 3 years ago

Other than implementing a version at the top of the schema as @stephenholleran suggested, I don't think we need to do anything else at this time. When we actually make a change we can think through and better provide guidance on how to:

In other words, I think figuring this out can be one of the stories to complete when we make a change (esp. if it is a breaking change). We do not need to burden the initial release with figuring it out now.

stephenholleran commented 3 years ago

I have create a new issue #47 to specifically deal with adding the version number to the schema.

This issue is about putting versions into the database so this issue will stay open.

stephenholleran commented 3 years ago

@kersting Do you know if there is a way to convert a JSON Schema file into a PostgresSQL or other database schema.

kersting commented 3 years ago

@stephenholleran Unfortunately we couldn't find any way to convert it.

stephenholleran commented 3 years ago

Hey @kersting, we still need to do something with regards the versioning of the PostgresSQL. At the moment whenever there is a change that effects the PostgresSQL I will keep it updated but we should formalize this with a version number.

stephenholleran commented 8 months ago

No clear actions to take. The SQL statements are more of a nice to have. Any user using them will most likely tweak their implementation of the database and so we feel there is no real reason to maintain a version history of this. We will endeavor to keep it up to date with the current version of the json schema.