Open stap-m opened 2 years ago
Just a minor comment concerning the RDF pattern in the diagram. I think it is unnecessarily complicated. I would suggest that the pattern should be along the following lines.
oekg:scenario123 a oeo:Scenario. oekg:scenario123 xyz:has_IRI \< address of website on OEP > . oekg:scenario123 xyz:has_record oekg:table456. oekg:table456 a xyz:Table. oekg:table456 xyz:has_IRI \< address of website on OEP > . oekg:table456 is about oeo:entity.
I am not sure about oeo:Scenario, xyz:has_record and xyz:Table entities. Firstly, are the tables associated with a scenario or a scenario projection? Secondly, depending on the answer on the first question, we need a relation that links it to an information entity, namely a table. It is probably a good idea to look at the OBI to see whether we can reuse a relation and a class from them. But regardless of whether we use oeo:Scenario, xyz:has_record and xyz:Table or some other IRIs, the pattern should be correct.
EDIT: Included the line connecting scenario and table to OEP. I am not sure what ontology term for xyz:has_IRI.
Sometimes, datasets contain scenarios:
Also, a scenario usually has many datasets(as input: assumptions, model parameter ..., as output: projections) This makes it difficult to make a pipeline. Besides, dataset values are not easily mappable to OEO concepts because users choosed vague and abbreviated terms.
Firstly, are the tables associated with a scenario or a scenario projection?
Yes. Currently the connection between tables and scenarios works mainly via the tags in the scenario schema, but in the future this link has to (also) be made via the factsheets/bundles.
Sometimes, datasets contain scenarios:
That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment exists also outside the tables, right?
That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment also exists outside the tables, right?
That is also my question: whether or not we have an explicit connection (usable via APIs) between the scenario and its datasets. But 'tags' work for filtering in this case.
That means, that there are tables that are used in more than one scenario. But that should be no problem, as far as the assignment exists also outside the tables, right?
No, it should be no problem. At least not for the "dumb and dirty" approach that we are currently following. Our approach contains of going through the content of all tables that are associated with a scenario projection. If an entry is either an OEO term or has been annotated by a third party with an OEO term, we use it as as object in an is-about triple. If it is something else, we try to automatically match it to an OEO term. (In the first approach by simple string matching, at some later stage we can improve that by using more sophisticated approaches.) Since the names of scenarios won't be in the OEO, tables that contain names of other scenarios won't be matched and, thus, ignored. That's ok. Actually, I expect that most of the terms won't be automatically be matchable to something in the OEO, even if we use very sophisticated methods.
As a first step, the following 'dumb and dirty' versions are the results of a pipeline based on simple 'string matching' between values in the tables and OEO concepts:
The following is the list of 'not assignable terms’ for datasets that belong to KS_2050: https://github.com/OpenEnergyPlatform/oekg/blob/Trial_autogenerated_oekg_via_pieline/not_assignables.txt
Thanks @adelmemariani . Let's continue the discussion here.
Does your script consider synonyms and alternative terms that are given in the OEO? I'm wondering, why PJ
wasn't found. It is as annotated as exact synonym of petajoule
(OEO_00050006).
Does your script consider synonyms and alternative terms that are given in the OEO? I'm wondering, why
PJ
wasn't found. It is as annotated as exact synonym ofpetajoule
(OEO_00050006).
:open_mouth: My script was not aware of 'synonyms' so far. Thnaks @stap-m . I will work on it...
By considering the has exact synonym
relations, the 'petajoule' and 'PJ' is now mappable and 'PJ' is no longer in the list of unassignable terms:
https://github.com/OpenEnergyPlatform/oekg/blob/Trial_autogenerated_oekg_via_pieline/Dummy_OEKG_With_Senario_Datasets_With_Labels_And_IRIs.ttl#L376
The overall result would be much better if we have synonyms for other unassignable terms.
😮 My script was not aware of 'synonyms' so far. Thnaks @stap-m . I will work on it...
Acutally, we agreed on using alternative term
instead of synonyms, but appearently, there are still some artifacts...
In the internal OvGU-meeting with @adelmemariani @fabianneuhaus and myself we developed a workflow for an automated KG generation. The task is now to establish the basic pipeline for this KG such that a first version can be created. Semantic enrichment etc. should not be considered at this stage and will be adressed later.
KG and workflow