gleanerio / gleaner

Gleaner: JSON-LD and structured data on the web harvesting
https://gleaner.io
Apache License 2.0
17 stars 10 forks source link

Set a base in the context for relative `@id`s #149

Closed nein09 closed 1 year ago

nein09 commented 1 year ago

For #79 . I confirmed that with this fix, the miller now outputs full RDF data for Datastream json.

Example:

{
    "@context": "https://schema.org/",
    "@type": "Dataset",
    "@id": "289012ba-0035-4993-a93d-cb13b6083c4c",
    "name": "RivTemp--MREAC",
    "description": "\"RivTemp is a partnership between universities, federal and provincial agencies, watershed organizations and Atlantic salmon conservation organizations. RivTemp aims to bring together organizations concerned with water temperature issues in Atlantic salmon rivers; to centralize temperature data collected by different organizations on a variety of rivers in Eastern Canada (RivTemp database) and to develop thermal metrics relevant to the development of salmon protection tools and protocols. The establishment of the RivTemp network and its database was made possible by the financial contribution of the Atlantic Salmon Conservation Foundation (ASCF) and the participation of numerous partners active in the salmon river thermal monitoring program. For more information visit http://rivtemp.ca/\"",
    "url": "https://datastream.org/dataset/289012ba-0035-4993-a93d-cb13b6083c4c",
    "version": "1.0.0",
    "datePublished": "2021-11-17T16:25:21.568Z",
    "dateModified": "2021-11-17T16:25:21.568Z",
    "isAccessibleForFree": true,
    "keywords": "RivTemp, water temperature, river, Atlantic salmon",
    "license": "https://opendatacommons.org/licenses/by/1-0/",
    "citation": "\"Please cite the source of the data when using the data in the database, i.e.: RivTemp (RivTemp.ca) and the partners, Dataset Name, who collected the data you are using. RivTemp. 2021-11-17. \"\"RivTemp--MREAC\"\" (dataset). 1.0.0. DataStream. https://doi.org/10.25976/td9n-mt03.\"",
    "identifier":
    {
        "@type":
        [
            "PropertyValue",
            "datacite:ResourceIdentifier"
        ],
        "datacite:usesIdentifierSchema":
        {
            "@id": "datacite:doi"
        },
        "propertyID": "DOI",
        "url": "https://doi.org/10.25976/td9n-mt03",
        "value": "10.25976/td9n-mt03"
    },
    "temporalCoverage": "2013-07-11T00:00:00+00:00/2017-10-08T00:00:00+00:00",
    "spatialCoverage":
    {
        "@type": "Place",
        "geo":
        {
            "@type": "GeoShape",
            "box": "-66.156313 46.547659 -65.39755 47.23424"
        }
    },
    "measurementTechnique": "\"Temperature Data are DAILY Mean (for DAILY Min and Max visit http://rivtemp.ca/). The instruments used for water temperature measurement vary from station to station. However, Hobo Pendant (Onset) models are widely used at these stations. Some stations are monitored only during the summer months while others are maintained year-round.\"",
    "variableMeasured":
    [],
    "creator":
    {
        "@type": "Organization",
        "name": "RivTemp--MREAC (RivTemp is responsible for the coordination and for processing data collected by the network partners)."
    },
    "publisher":
    {
        "@type": "Organization",
        "name": "DataStream",
        "url": "https://datastream.org",
        "logo": "https://datastream.org/favicon.svg"
    }
}

which becomes this with json fixups:

{
    "@context":
    {
        "@base": "http://datastream.org",
        "@vocab": "https://schema.org/"
    },
    "@type": "Dataset",
    "@id": "289012ba-0035-4993-a93d-cb13b6083c4c",
    "name": "RivTemp--MREAC",
    "description": "\"RivTemp is a partnership between universities, federal and provincial agencies, watershed organizations and Atlantic salmon conservation organizations. RivTemp aims to bring together organizations concerned with water temperature issues in Atlantic salmon rivers; to centralize temperature data collected by different organizations on a variety of rivers in Eastern Canada (RivTemp database) and to develop thermal metrics relevant to the development of salmon protection tools and protocols. The establishment of the RivTemp network and its database was made possible by the financial contribution of the Atlantic Salmon Conservation Foundation (ASCF) and the participation of numerous partners active in the salmon river thermal monitoring program. For more information visit http://rivtemp.ca/\"",
    "url": "https://datastream.org/dataset/289012ba-0035-4993-a93d-cb13b6083c4c",
    "version": "1.0.0",
    "datePublished": "2021-11-17T16:25:21.568Z",
    "dateModified": "2021-11-17T16:25:21.568Z",
    "isAccessibleForFree": true,
    "keywords": "RivTemp, water temperature, river, Atlantic salmon",
    "license": "https://opendatacommons.org/licenses/by/1-0/",
    "citation": "\"Please cite the source of the data when using the data in the database, i.e.: RivTemp (RivTemp.ca) and the partners, Dataset Name, who collected the data you are using. RivTemp. 2021-11-17. \"\"RivTemp--MREAC\"\" (dataset). 1.0.0. DataStream. https://doi.org/10.25976/td9n-mt03.\"",
    "identifier":
    {
        "@type":
        [
            "PropertyValue",
            "datacite:ResourceIdentifier"
        ],
        "datacite:usesIdentifierSchema":
        {
            "@id": "datacite:doi"
        },
        "propertyID": "DOI",
        "url": "https://doi.org/10.25976/td9n-mt03",
        "value": "10.25976/td9n-mt03"
    },
    "temporalCoverage": "2013-07-11T00:00:00+00:00/2017-10-08T00:00:00+00:00",
    "spatialCoverage":
    {
        "@type": "Place",
        "geo":
        {
            "@type": "GeoShape",
            "box": "-66.156313 46.547659 -65.39755 47.23424"
        }
    },
    "measurementTechnique": "\"Temperature Data are DAILY Mean (for DAILY Min and Max visit http://rivtemp.ca/). The instruments used for water temperature measurement vary from station to station. However, Hobo Pendant (Onset) models are widely used at these stations. Some stations are monitored only during the summer months while others are maintained year-round.\"",
    "variableMeasured":
    [],
    "creator":
    {
        "@type": "Organization",
        "name": "RivTemp--MREAC (RivTemp is responsible for the coordination and for processing data collected by the network partners)."
    },
    "publisher":
    {
        "@type": "Organization",
        "name": "DataStream",
        "url": "https://datastream.org",
        "logo": "https://datastream.org/favicon.svg"
    }
}

is now (note the full-IRI ids!)

<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Dataset> .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/citation> "\"Please cite the source of the data when using the data in the database, i.e.: RivTemp (RivTemp.ca) and the partners, Dataset Name, who collected the data you are using. RivTemp. 2021-11-17. \"\"RivTemp--MREAC\"\" (dataset). 1.0.0. DataStream. https://doi.org/10.25976/td9n-mt03.\"" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/creator> _:bcfdga5qstu0oon8cb2jg .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/dateModified> "2021-11-17T16:25:21.568Z" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/datePublished> "2021-11-17T16:25:21.568Z" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/description> "\"RivTemp is a partnership between universities, federal and provincial agencies, watershed organizations and Atlantic salmon conservation organizations. RivTemp aims to bring together organizations concerned with water temperature issues in Atlantic salmon rivers; to centralize temperature data collected by different organizations on a variety of rivers in Eastern Canada (RivTemp database) and to develop thermal metrics relevant to the development of salmon protection tools and protocols. The establishment of the RivTemp network and its database was made possible by the financial contribution of the Atlantic Salmon Conservation Foundation (ASCF) and the participation of numerous partners active in the salmon river thermal monitoring program. For more information visit http://rivtemp.ca/\"" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/identifier> _:bcfdga5qstu0oon8cb2k0 .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/isAccessibleForFree> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/keywords> "RivTemp, water temperature, river, Atlantic salmon" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/license> "https://opendatacommons.org/licenses/by/1-0/" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/measurementTechnique> "\"Temperature Data are DAILY Mean (for DAILY Min and Max visit http://rivtemp.ca/). The instruments used for water temperature measurement vary from station to station. However, Hobo Pendant (Onset) models are widely used at these stations. Some stations are monitored only during the summer months while others are maintained year-round.\"" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/name> "RivTemp--MREAC" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/publisher> _:bcfdga5qstu0oon8cb2kg .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/spatialCoverage> _:bcfdga5qstu0oon8cb2l0 .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/temporalCoverage> "2013-07-11T00:00:00+00:00/2017-10-08T00:00:00+00:00" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/url> "https://datastream.org/dataset/289012ba-0035-4993-a93d-cb13b6083c4c" .
<http://datastream.org/289012ba-0035-4993-a93d-cb13b6083c4c> <https://schema.org/version> "1.0.0" .
_:bcfdga5qstu0oon8cb2jg <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
_:bcfdga5qstu0oon8cb2jg <https://schema.org/name> "RivTemp--MREAC (RivTemp is responsible for the coordination and for processing data collected by the network partners)." .
_:bcfdga5qstu0oon8cb2k0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/PropertyValue> .
_:bcfdga5qstu0oon8cb2k0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <datacite:ResourceIdentifier> .
_:bcfdga5qstu0oon8cb2k0 <datacite:usesIdentifierSchema> <datacite:doi> .
_:bcfdga5qstu0oon8cb2k0 <https://schema.org/propertyID> "DOI" .
_:bcfdga5qstu0oon8cb2k0 <https://schema.org/url> "https://doi.org/10.25976/td9n-mt03" .
_:bcfdga5qstu0oon8cb2k0 <https://schema.org/value> "10.25976/td9n-mt03" .
_:bcfdga5qstu0oon8cb2kg <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
_:bcfdga5qstu0oon8cb2kg <https://schema.org/logo> "https://datastream.org/favicon.svg" .
_:bcfdga5qstu0oon8cb2kg <https://schema.org/name> "DataStream" .
_:bcfdga5qstu0oon8cb2kg <https://schema.org/url> "https://datastream.org" .
_:bcfdga5qstu0oon8cb2l0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Place> .
_:bcfdga5qstu0oon8cb2l0 <https://schema.org/geo> _:bcfdga5qstu0oon8cb2lg .
_:bcfdga5qstu0oon8cb2lg <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/GeoShape> .
_:bcfdga5qstu0oon8cb2lg <https://schema.org/box> "-66.156313 46.547659 -65.39755 47.23424" .
nein09 commented 1 year ago

It might be best for https://github.com/gleanerio/gleaner/pull/135/files# to get merged first and for me to rebase onto it / integrate this work into it

nein09 commented 1 year ago

@valentinedwv That's a good idea, but I'm honestly not sure what a good default for that might be.

valentinedwv commented 1 year ago

skip that thought... it's probably a per-repo option. If @id is a url, then it seems to modify the @id.

nein09 commented 1 year ago

Per meeting on 2 March 2023: the correct behavior is to drop the @id altogether.

nein09 commented 1 year ago

I'm reworking this on another branch.