OpenEnergyPlatform / oekg

Repository for the Open Energy Knowledge Graph (OEKG)
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

Ontological data annotation to utilise data sets with OEKG in the future #6

Open chrwm opened 2 years ago

chrwm commented 2 years ago

Are there requirements to be met when annotating data sets today so that these data sets can be used with OEKG applications in the future?

I understand that the OEKG build upon metadata in RDF format. However, at the moment, no OEmetadata standard has been developed in RDF format. Therefore, currently, OEMetadata v151 in JSON format must be used to annotate tabular data ontologically. I ask to prevent us from having to put a lot of work into annotating data sets ontologically, which might then not be usable in the future.

adelmemariani commented 2 years ago

The only requirement is that all fields in the meta information (under the tables) and that all column names should be associated with some OEO concepts or some concepts from other ontologies (in case OEO does not cover the concept yet). Also this might help.

As the OEKG can be populated repeatedly, we can read the values (via Python) from different formats (i.e., JSON) and update the knowledge graph accordingly. The main point here is that the development of the OEKG is not a static and one-shot operation. It is a recurring and evolving process. In this process, data formats seem not to be very critical, however, conceptualization (matching meta-data to ontological concepts) is important.

Ludee commented 2 years ago

That is an important information and should be documented properly. We need to decide for documentation formats. Perhaps we create a new subpage under "Ontology" at the OEP or a new section "Knowledge Graph"? In addition we need a developer documentation. This can be part of the existing OEP-RtD or a separate here.

chrwm commented 2 years ago

The main point here is that the development of the OEKG is not a static and one-shot operation. It is a recurring and evolving process. In this process, data formats seem not to be very critical, however, conceptualization (matching meta-data to ontological concepts) is important.

From this, I understand that connecting a concept and the data is paramount and how it is done secondary.

For documentation here is how we go about it in the SEDOS project.

We'll use the oemetadata v1.5.1

----- Case1 ----- In cases where there is a single suitable ontology concept in the OEO we'll use the keys subject, isAbout, valueReference as intended

----- Case2 ----- (UPDATED) In cases where there are multiple ontology concepts in the OEO that are suitable by using them compoundly we'll use them as list of dicts in the isAbout key.

For example: thermal efficiency of a heat power plant (as column in a tabular data set)

The concept thermal efficiency is not (yet, as of 23.09.22) available in the OEO, but the concepts:

(UPDATED example)

"resources": [
        {
            "profile": null,
            "name": null,
            "path": null,
            "format": null,
            "encoding": null,
            "schema": {
                "fields": [
                    {
                        "name": "thermal efficiency",
                        "description": "The column holds the values of the thermal efficiency of a heat power plant",
                        "type": null,
                        "unit": null,
                        "isAbout": [
                            {
                              "name": "heat generation process",
                              "path": "http://openenergy-platform.org/ontology/oeo/oeo-physical/OEO_00010248"
                            },
                            {
                              "name": "energy conversion efficiency",
                              "path": "http://openenergy-platform.org/ontology/oeo/OEO_00140049"
                            }
                        ],
                        "valueReference": [
                            {
                                "value": null,
                                "name": null,
                                "path": null
                            }
                        ]
                    },

We're aware that this violates the use of name (using "" it at least fits its schema) but this is our interpretation of your comment "data formats seem not to be very critical, however, conceptualization (matching meta-data to ontological concepts) is important." @adelmemariani do you agree or do you have concerns with it?

----- Case3 ----- In cases where there is NO suitable ontology concept in the OEO we'll copy the term used in the data directly to the namekey for further data processing in SEDOS. Note: This is SEDOS-specific and needed for data processing. Normally one would leave the annotation in isAbout empty.

For example: fantasy power plant paramter

"resources": [
        {
            "profile": null,
            "name": null,
            "path": null,
            "format": null,
            "encoding": null,
            "schema": {
                "fields": [
                    {
                        "name": "fantasy power plant paramter",
                        "description": "The column holds values of a parameter, whose concept is not yet available in the OEO, of a fantasy power plant ",
                        "type": null,
                        "unit": null,
                        "isAbout": [
                            {
                                "name": "fantasy power plant parameter",
                                "path": null
                            }
                        ],
                        "valueReference": [
                            {
                                "value": null,
                                "name": null,
                                "path": null
                            }
                        ]
                    },
l-emele commented 2 years ago

In the second case, thermal efficiency can fully expressed with the OEO: 'energy conversion efficiency' 'process attribute of' some 'heat generation process'. That shows also the relation between the two concepts and that is additional information compared to just listing the involved classes.

chrwm commented 2 years ago

Thanks for the hint! However, the metadata should only annotate concepts and do not have the role of mapping relations. The mapping of relations, in the context of a subset of the OEO, is after all achieved by the OEKG.

l-emele commented 2 years ago

I don't think that simply listing classes that are somehow involved is a good solution. We once talked about about an oeo-module for composed classes. So there could then be a composed class XYZ SubClassOf: 'energy conversion efficiency' 'process attribute of' some 'heat generation process' and XYZ then the class referenced in the meta data.

@stap-m : Do you remember if we documented the idea of this oeo-module for composed classes for data annotation and the knowledge graph somewhere?

stap-m commented 2 years ago

We documented the idea in the etherpad of the 6th project meeting. However, the notes that were taken are not elaborate in any way... That's it:

Discussion on the combination of terms

  • create a new class "warmwasserbedarf" and use it
  • But the number of combination is very high
  • It is not possible in RDF that easy
  • -> create a module in the oeo "compositions"

Clustering the compositions:

  • those related to projections
  • those related to narriatives
  • those related to the study report (authorship, ...)
chrwm commented 2 years ago

The partners in the SEDOS project will use the oemetadata and OEO in the user role rather than the developer role. Thus, it should be as user-friendly and easy as possible to work with both. Diving into the axioms seems to be error-prone and this additional workload is difficult to justify from the user's point of view. I argue for a simple solution for the user and welcome a technical solution in the backend to achieve this, as it seems to have been suggested.