OpenEnergyPlatform / data-preprocessing

Repository for data formatting, import of data, data and metadata review, and data curation.
GNU Affero General Public License v3.0
10 stars 7 forks source link

Review: Republishing of the Ladesäuleregister from the BNetzA #112

Open areleu opened 7 months ago

areleu commented 7 months ago

Issue description

This is a cleaned up and annotated version of the Ladesäuleregister of the BNetzA

The code to do this is made available here: https://github.com/areleu/fair-charging-station-data

A poster asociated to this cleanup was presented in the RDA 21st plenary during the IDW2023

I am publishing the release of last month, this data is updated like 3 times a year according to what I have noticed until now. I know that in the OEP naming conventions dates are discouraged. But I still think that the most proper way of redistributing this data is by associating it to the date of publication. @chrwm suggested that I add a column for publication date but I think this would cause 2 problems:

  1. The size of the tables will be significantly increased each time a new version is added, which in the long run can be detrimental. Most of the information there would be duplicated
  2. The information could be ammended by the data providers so each dataset is fundamentally different from the other.

There is a version of the dataset which is also Normalised, which saves a lot of space and is significatnly more manageable but it requieres multiple tables, and I don't know if this feature is already available.

Workflow checklist

  1. GitHub

    • [x] I have submitted this issue to have metadata and data review documented (Issue #NR)
    • [x] Create a new review-branch and push OEMetadata to new branch (review/project_nameofdata#NR). If this step is too difficult, attach a file with the metadata as a comment in this issue and let the reviewer know.
  2. OEP

    • [x] Upload data to the OEP in schema model_draft (see upload tutorial)
    • [x] Link URL of data in this issue (model_draft.project_nameofdata)
  3. Start a Review

    • [x] Start a pull request (PR) from review-branch to master
    • [x] Assign a reviewer and get in contact
  4. Reviewer section

    • [x] A reviewer starts working on the issue
    • [x] Review data license
    • [ ] A reviewer finished working on this issue (and awarded a badge)
    • [x] Update metadata on table
    • [ ] Data moved to its final schema
    • [ ] Add OEP tags to table
    • [ ] Merge PR and delete review-branch
    • [ ] Document final links of metadata and data in issue description
    • [ ] Close issue

Metadata and data for review

Here are the links to my data and metadata. Naming follows the pattern model_draft.project_nameofdata: Metadata: https://github.com/OpenEnergyPlatform/data-preprocessing/blob/review/bnetza_charging_stations_01_07_2023%23112/data-review/bnetza_charging_stations_01_07_2023.json Data: https://openenergy-platform.org/dataedit/view/model_draft/bnetza_charging_stations_01_07_2023

Reviewed and published metadata and data

Final naming and location of the data and metadata after the review are as follows: schema.tablename

jh-RLI commented 7 months ago

Hi @areleu,

In general, I don't see much reason not to use the date, but in practice this is more metadata, which is why it should be stored in the oemetadata. If there is no other way, then use only year-month, otherwise I would strongly recommend a version number.

In the metadata, you can use the title field to change the table name that is visible in the OEP. In most cases, it is better to provide this information there. It will be displaed in this format: --> readable table name from title (technical table name)

I'm not sure if your comment about multiple tables has anything to do with it, but: it is also possible to upload multiple tables (including tables containing relationships), but this is more complicated and they are not visibly grouped. So far the tags should then be used so that these tables can be found together.

Otherwise, the metadata looks good. One thing I would have to double check: You have always deleted the "valueReference": [] in the resources if you have not specified one there. I'm not sure if this causes an error when displaying the metadata for these fields.

areleu commented 7 months ago

In general, I don't see much reason not to use the date, but in practice this is more metadata, which is why it should be stored in the oemetadata. If there is no other way, then use only year-month, otherwise I would strongly recommend a version number.

The date is in the publicationDate field as well. I would use version number but since this is a redistribution, I think is important to stick to their "versioning" pattern. Otherwise is just confusing. Unfortunately the primary source is not versioned, once a new version comes out the previous version is deleted. And I don't really know how many versions were out there, do I assign this version 1? but what would it mean? if someone tries to track this number to the original website they will be looking for the wrong thing. When I put the date, at least they know this is based on the "Ladesäuleregister released on ZZ.XX.YYYY" file which they will probably not find because they are not archived.

I guess a compromise would be to name the version as the date itself, but arent we missing then a version field on the header? We could add that and also add a versioning feature to the OEP. That way I could give the table a generic name and different versions could be associated to the same table, or if that is not done in the OEP itself we could also think about doing some kind of Zenodo integration.

BTW, is there any plan to give DOIs to the metadata? I think this would make the OEP more or less "FAIR complete". If the OEP does not do this, we could then really think about this Zenodo integration.

I'm not sure if your comment about multiple tables has anything to do with it, but: it is also possible to upload multiple tables (including tables containing relationships), but this is more complicated and they are not visibly grouped. So far the tags should then be used so that these tables can be found together.

I mean a relational model, each table on their own is not very useful, the frictionless spec allows a data package to have multiple resources associated to each other with foreign keys. Frictionless also alows to build a relational database out of these multi-resource metadata descriptors. Unfortunately the OEP is not yet adapted to fully exploit this feature but I think it is very necessary to have since you rarely find datasets consisting of single tables, and if you do they are not good structured (see example of this dataset, it has the same fields 4 times but with numbers).

areleu commented 7 months ago

Otherwise, the metadata looks good. One thing I would have to double check: You have always deleted the "valueReference": [] in the resources if you have not specified one there. I'm not sure if this causes an error when displaying the metadata for these fields.

So the metadata is already in the model_draft section and it seems that is raising some error but is not completely breaking the page

grafik

jh-RLI commented 7 months ago

I understand what you are pointing out, and in this case I think it makes sense to include the date until there is a better solution. It's still a compromise, but you're right that it should be obvious which source this data table refers to. Still, it doesn't seem that important since the source is also included in the metadata. But since I like pragmatic solutions, I won't be restrictive here.

I guess a compromise would be to name the version as the date itself, but arent we missing then a version field on the header? We could add that and also add a versioning feature to the OEP.

This feature has been planned for a long time and was available in some form a few years ago, but was then dropped because the developer left the team. The versioning is running in the background (at least that's what I was told, I never had the time to check if this is actually the case). The problem is the same as always: this feature is currently not part of the research projects we are working on .... . We are currently working on including such features in upcoming projects.

BTW, is there any plan to give DOIs to the metadata? I think this would make the OEP more or less "FAIR complete". If the OEP does not do this, we could then really think about this Zenodo integration.

Yes, this is also planned, and I remember that we agreed on Zenodo integration. What is holding this back is the same problem I described above ... currently not part of the research projects.

I mean a relational model, each table on their own is not very useful, the frictionless spec allows a data package to have multiple resources associated to each other with foreign keys.

It is true that it is currently not possible to provide the metadata for each resource (table) (in a single oemetadata json string). This is something we are also working on (see this issue). In general, I also agree that the oep does not fully support multi-table models, but my point is that in general the Resources field in the oemetadata can already be used to specify and create (by using oem2orm software) a relational model (see this example that specifies the oedatamodel).

jh-RLI commented 7 months ago

So the metadata is already in the model_draft section and it seems that is raising some error but is not completely breaking the page

grafik

Ah, thanks for checking, I was expecting that :) But as you can guess, the metadata viewer widget will also be reworked so that it can handle missing fields. Hopefully it won't take too long as this is also not part of a research project but shouldn't be too much effort and will have to happen in the course of the future oemetadata updates anyway ... .

jh-RLI commented 7 months ago

Regarding the review: With this metadata you can get the "Platinum" badge. We miss propper documentation about the bagdes. In short: you can get 'platinum' with this metadata because you have also provided annotations.

Which topic should this data be moved to? https://openenergy-platform.org/dataedit/schemas

As we are still in the transition phase to move this review process to the oep platform, it would make sense to carry out this review again on the oeplatform as soon as the last bugs in the OpenPeerReview functions have been fixed.

areleu commented 7 months ago

Which topic should this data be moved to? openenergy-platform.org/dataedit/schemas

Tricky, maybe economy?

economy: Data related to economic activities. Examples: sectoral value added, sectoral inputs and outputs, GDP, prices of commodities etc.

It is infrastructure data but not energy infrastructure data per se so does not belong to grid.

grid: Energy transmission infrastructure. examples: power lines, substation, pipelines.

But if I put my energy modelling shoes I would say demand, but it is not quite demand data either.

demand: Data on demand. Demand can relate to commodities but also to services.

@fabmio any suggestions?

edit: It is infrastructure data, so if we expand the scope of grid to infrastructure this could be easily decided.

jh-RLI commented 7 months ago

I agree that grid includes infrastructure. I will move the table and let you know. 👍

jh-RLI commented 6 months ago

We are currently implementing a publishing process for the OEP as we want to avoid the review on Github. We will try to migrate the currently published data as good as possible, but it would be great to have some users to test the process and give feedback. It will take some time to implement, but there will be a slimmed down version of the feature available soon. You will be able to use it via the profile page in the OEP. Are you okay with waiting for this?

areleu commented 6 months ago

We are currently implementing a publishing process for the OEP as we want to avoid the review on Github. We will try to migrate the currently published data as good as possible, but it would be great to have some users to test the process and give feedback. It will take some time to implement, but there will be a slimmed down version of the feature available soon. You will be able to use it via the profile page in the OEP. Are you okay with waiting for this?

Sure I can help testing it.

jh-RLI commented 3 months ago

Hi @areleu, now on the profile pages of the oep website, you can use the publish button :)

You may have already noticed the changes. Here's a quick summary of how to access them:

  1. On https://openenergyplatform.org/ navigate to the profile page and click the tables section: image

  2. browse your tables. There are two sections Published & Draft. This view is paginated so you may need to search for your table. I realise that a search function would be helpful and it will be implemented.

  3. once you have found the table, check to see if the licence check has been completed. If so, you should be able to click the "Publish" button and select the "Infrastructure" theme.

  4. if the licence check has failed, you must ensure that your licence name (from the "oemetadata licences" field) matches the spdx licence list. image