geopython / pygeometa

pygeometa is a Python package to generate metadata for geospatial datasets
https://geopython.github.io/pygeometa
Other
104 stars 44 forks source link

Map the yaml config to the W3C meta-data ontologies #180

Open ldesousa opened 2 years ago

ldesousa commented 2 years ago

Update: this issue is considerably broader than initially assessed, the output RDF has semantic flaws at various levels that aren't solvable with the current parser. I will start by creating a thorough map between each keyword in the yaml configuration file to classes and predicates in the W3C ontologies (DCat, VCard, Prov, etc). From that map a new parser can be developed with a more direct RDF generation.

I transformed the JSON-LD output to Turtle for legibility (see below). The organisation instance is using the vcard:fn predicate, which is a data property of the Individual class. Basically it says that the full name of EC is "Tom Kralidis". The semantics of VCard must be revised.

<https://www.ec.gc.ca/> a vcard:Organization ;
    vcard:country-name "Canada" ;
    vcard:fn "Tom Kralidis" ;
    vcard:hasEmail "foo@bar.tld" ;
    vcard:hasOrganizationName "Environment Canada" ;
    vcard:hasTelephone "+01-123-456-7890" ;
    vcard:locality "Toronto" ;
    vcard:postal-code "M3H 5T4" ;
    vcard:street-address "4905 Dufferin Street" ;
    dcat:accessURL "https://www.ec.gc.ca/" .
ldesousa commented 2 years ago

Below is a snippet from the JSON-LD output for the sample.yml file. The contact individual is replicated and is not interpreted correctly by the RDF parser.

    "contact": [
        {
            "organization": "Environment Canada",
            "url": "https://www.ec.gc.ca/",
            "individualname": "Tom Kralidis",
            "positionname": "Senior Systems Scientist",
            "phone": "+01-123-456-7890",
            "fax": "+01-123-456-7890",
            "address": "4905 Dufferin Street",
            "city": "Toronto",
            "administrativearea": "Ontario",
            "postalcode": "M3H 5T4",
            "country": "Canada",
            "email": "foo@bar.tld",
            "hoursofservice": "0700h - 1500h EST",
            "contactinstructions": "email",
            "@id": "https://www.ec.gc.ca/",
            "@type": "vcard:Organization"
        },
        {
            "organization": "Environment Canada",
            "url": "https://www.ec.gc.ca/",
            "individualname": "Tom Kralidis",
            "positionname": "Senior Systems Scientist",
            "phone": "+01-123-456-7890",
            "fax": "+01-123-456-7890",
            "address": "4905 Dufferin Street",
            "city": "Toronto",
            "administrativearea": "Ontario",
            "postalcode": "M3H 5T4",
            "country": "Canada",
            "email": "foo@bar.tld",
            "hoursofservice": "0700h - 1500h EST",
            "contactinstructions": "email",
            "@id": "https://www.ec.gc.ca/",
            "@type": "vcard:Organization"
        }
    ],
ldesousa commented 2 years ago

I started the map in my fork. Still has some ways to go, but could already be usable. When the DataProperty column has content a literal is sufficient. Otherwise, if the content is in the ObjectProperty column then a new instance is necessary, of the type in the Range column