support parsing EML into triples

amoeba commented 3 years ago

We're still working on mappings in #21 but I have got a work-in-progress version of this going. It's got a number of issues but I think it's a good start. The following command triplifies an entire DataONE Data Package using the latest state of the code in https://github.com/DataONEorg/slinky/tree/feature_update_graph_pattern.

$ slinky get doi:10.5063/F1QR4VCB

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<https://dataone.org/datasets/doi%3A10.5063%2FF1QR4VCB>
    <http://spdx.org/rdf/terms#Checksum> "5ed8a1a35e943bc3cd99180189f07293e3186ca4" ;
    <http://spdx.org/rdf/terms#ChecksumAlgorithm> "SHA1" ;
    a <https://schema.org/Dataset> ;
    <https://schema.org/award> """State of Alaska's Salmon and People (Gordon and Betty Moore Foundation Award 5124)
         Data Task Forces for Better Synthesis Studies (Gordon and Betty Moore Foundation Award 5451)""" ;
    <https://schema.org/byteSize> 12273388 ;
    <https://schema.org/creator> <https://dataone.org/organizations/urn%3Auuid%3Ac436ba14-64fb-4cbc-83a3-1d4349614860> ;
    <https://schema.org/dateModified> "2019-03-19T21:06:38.869000Z" ;
    <https://schema.org/datePublished> "2018", "2019-03-19T21:06:37.466000Z" ;
    <https://schema.org/description> """The Well Log Tracking System (WELTS) contains lithologic information submitted to the Division of Mining,
Land and Water, Alaska Hydrologic Survey by water well contractors as required per Alaska State Statute 41.08.020(b4) authority delegated to the Alaska Hydrologic Survey per Department Order 115, Ârequire of water well contractors, the filing with it of basic water and aquifer data normally obtained, including but not limited to well location, estimated elevation, well driller's logs, pumping tests and flow measurements, and water quality determinations.Â Additionally, per Alaska Administrative Code, Title 11 Natural Resources, Part 6 Lands, Chapter 93 Water Management, Article 2 Appropriation and Use of Water 11 AAC 93.140
(a):For a drilled, driven, jetted, or augered well constructed, the water well contractor or a person who constructs
the well shall file a report within 45 days after completion with both the property owner and the department.
The report must contain the following information as applicable:
(1) the method of construction;
(2) the type of fluids used for drilling;
(3) the location of the well;
(4) an accurate log of the soil and rock formations encountered and the depths at which the formations occur;
(5) the depth of the casing;
(6) the height of the casing above ground;
(7) the depth and type of grouting;
(8) the depth of any screens;
(9) the casing diameter;
(10) the casing material;
(11) the depth of perforation or opening in the casing;
(12) the well development method;
(13) the total depth of the well;
(14) the depth of the static water level;
(15) the anticipated use of the well;
(16) the maximum well yield;
(17) the results of any well yield, aquifer, or drawdown test that was conducted;
(18) if the water well contractor or person who constructs the well installs a pump at the time of construction,
the depth of the pump intake and the rated pump capacity at that depth.

(b) When the drill rig is removed from the well site, the well must be sealed with a sanitary seal and a readily accessible means provided to allow for monitoring of the static water level in the well.

(c) A hand-dug well that is permanently decommissioned shall be filled by the land owner to a point 12 inches above the existing ground level with well-compacted impermeable material.

(d) A well, other than a hand-dug well, that is permanently decommissioned by the owner of the well must comply with the requirements of 18 AAC 80.015(e) .

(e) If the department believes that an encounter of oil, gas, or other hazardous substance is likely to result from well drilling, the department will notify the Alaska Oil and Gas Conservation Commission, and the provisions of AS 31.05.030 (g) may apply.

(f) The department will notify the Department of Environmental Conservation of any permanently abandoned well that may contaminate water of the state under the provisions of 18 AAC 80.

(g) Information required by (a) of this section is required for any water well that has been deepened, modified, or abandoned, and for any water supply well or water well that is used for monitoring, observation, or aquifer testing, including a dry or low-yield water well that is not used.

This shape  file characterizes the geographic representation of well logs within the State of Alaska contained in the Well Log Tracking System.
The shape file was developed using well location information submitted with well logs.

Well locations represented by a gold star symbol, represent the approximate (centroid) location, and may represent a cluster of wells.

Well locations represented by a blue circle symbol, represent wells submitted with latitude and longitude coordinates.

Each feature has an associated attribute record, including a  Well Log Tracking System identification number which serves as an index to case-file information. Those requiring more information regarding WELTS should contact the Alaska Department of Natural Resources Alaska Hydrologic Survey directly.""" ;
    <https://schema.org/identifier> [
        a <https://schema.org/PropertyValue> ;
        <https://schema.org/propertyID> "https://registry.identifiers.org/registry/doi" ;
        <https://schema.org/url> "https://doi.org/doi%3A10.5063%2FF1QR4VCB" ;
        <https://schema.org/value> "10.5063/F1QR4VCB"
    ] ;
    <https://schema.org/isAccessibleForFree> "true" ;
    <https://schema.org/keyword> "land use" ;
    <https://schema.org/name> "Locations of Wells in Alaska using the Well Log Tracking System, 2016" ;
    <https://schema.org/sameAs> "https://doi.org/10.5063%2FF1QR4VCB" ;
    <https://schema.org/schemaVersion> "eml://ecoinformatics.org/eml-2.1.1" ;
    <https://schema.org/spatialCoverage> [
        a <https://schema.org/Place> ;
        <https://schema.org/additionalProperty> [
            a <https://schema.org/PropertyValue> ;
            <https://schema.org/name> "Spatial Reference System" ;
            <https://schema.org/propertyID> <http://www.wikidata.org/entity/Q4018860> ;
            <https://schema.org/value> "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
        ], [
            a <https://schema.org/PropertyValue> ;
            <https://schema.org/name> "Well-Known Text (WKT) representation of geometry" ;
            <https://schema.org/propertyID> <http://www.wikidata.org/entity/Q4018860> ;
            <https://schema.org/value> "POLYGON ((170.375 73.875, -125.5 73.875, -125.5 47.375, 170.375 47.375, 170.375 73.875))"
        ] ;
        <https://schema.org/description> "Alaska" ;
        <https://schema.org/geo> [
            a <https://schema.org/GeoShape> ;
            <https://schema.org/box> "73.875,-125.5 47.375,170.375"
        ]
    ] ;
    <https://schema.org/temporalCoverage> "2017-02-15" ;
    <https://schema.org/url> "https://dataone.org/datasets/doi%3A10.5063%2FF1QR4VCB" ;
    <https://schema.org/variableMeasured> "BEDROCK_DEPTH", "BLOCK", "CASING_DEPTH", "CASING_DIA", "CASING_NOTE", "CASING_STICKUP", "CASING_THICK", "CASING_TYPE", "CITY", "COMPANY_NAME", "DATE_COMPLETE", "DATE_START", "DEV_DURATION", "DEV_METHOD", "DISINFECT", "DISINF_METHOD", "DOE", "DRILLERS_COMMENTS", "DRILL_METHOD", "FULL_NAME", "GRAVEL", "GRAVEL_START", "GRAVEL_STOP", "GROUT_FROM", "GROUT_TO", "GROUT_VOL", "HOLE_DEPTH", "INTAKE_TYPE", "LAT", "LINER_DIA", "LINER_TYPE", "LL_SOURCE", "LOCATION_NOTE", "LOG_ID", "LON", "LOT", "Log_ID", "MAPNUM", "MERID", "MODDATE", "MODUSER", "OWNER", "PDESC", "PERF1_START", "PERF1_STOP", "PERF2_START", "PERF2_STOP", "PUMP_FT", "PUMP_GPM", "PUMP_HP", "PUMP_HR", "PUMP_INTAKE_FT", "QTR_SEC", "RECOV_RATE", "REGION", "SCREEN_NOTE", "SCREEN_SIZE", "SCREEN_START", "SCREEN_STOP", "SCREEN_TYPE", "SECTION", "STATIC_WATER_FROM", "STATIC_WATER_LEVEL", "STATUS", "SUBDIVISION", "SW_DATE", "TEST_METHOD", "TRACT", "TWNSHP", "WELL_USE" ;
    <https://schema.org/wasRevisionOf> "https://dataone.org/datasets/doi%3A10.5063%2FF1CJ8BQZ" .

<https://dataone.org/datasets/urn%3Auuid%3A2e4adb4b-2e57-476a-aba7-db1c72723375>
    <http://spdx.org/rdf/terms#Checksum> "afc17730924910ee1653e0889f54130658dbe9c3" ;
    <http://spdx.org/rdf/terms#ChecksumAlgorithm> "SHA-1" ;
    a <https://schema.org/DataDownload> ;
    <https://schema.org/byteSize> 11166729 ;
    <https://schema.org/contentUrl> "https://search.dataone.org/cn/v2/resolve/urn%3Auuid%3A2e4adb4b-2e57-476a-aba7-db1c72723375" ;
    <https://schema.org/dateModified> "2018-07-25T00:32:56.245000Z" ;
    <https://schema.org/datePublished> "2017-03-03T22:32:04.602000Z" ;
    <https://schema.org/encodingFormat> "text/csv" ;
    <https://schema.org/identifier> "https://dataone.org/datasets/urn%3Auuid%3A2e4adb4b-2e57-476a-aba7-db1c72723375" ;
    <https://schema.org/name> "WELTS_flatfile.csv" .

<https://dataone.org/datasets/urn%3Auuid%3A83bf45de-1db2-43cb-9ac4-2ed2e564312f>
    <http://spdx.org/rdf/terms#Checksum> "91de80c44927c196e862c803aa86e221433a45a8" ;
    <http://spdx.org/rdf/terms#ChecksumAlgorithm> "SHA-1" ;
    a <https://schema.org/DataDownload> ;
    <https://schema.org/byteSize> 963049 ;
    <https://schema.org/contentUrl> "https://search.dataone.org/cn/v2/resolve/urn%3Auuid%3A83bf45de-1db2-43cb-9ac4-2ed2e564312f" ;
    <https://schema.org/dateModified> "2018-07-25T00:33:00.360000Z" ;
    <https://schema.org/datePublished> "2017-03-03T22:32:08.792000Z" ;
    <https://schema.org/encodingFormat> "application/zip" ;
    <https://schema.org/identifier> "https://dataone.org/datasets/urn%3Auuid%3A83bf45de-1db2-43cb-9ac4-2ed2e564312f" ;
    <https://schema.org/name> "WELTS.zip" .

<https://dataone.org/organizations/urn%3Auuid%3Ac436ba14-64fb-4cbc-83a3-1d4349614860>
    a <https://schema.org/Organization> ;
    <https://schema.org/name> "Alaska Department of Natural Resources, Support Services Division, Information Resource Management" .

mbjones commented 3 years ago

@amoeba this is looking good. I mentioned this on our call last week, but repeating here for posterity.... it would be good to provide structured info for https://schema.org/variableMeasured when we have it. The new SOSO guidelines on variableMeasured look like they will be amendable to 3 levels of detail:

1) If you don't have any semantic measurement type, at least provide variable names in text format using schema:PropertyValue but without the propertyId field. 2) If you have some controlled info about measurement properties (e.g., oboe:Characteristic), provide them as the propertyId for a PropertyValue instance 3) If you have full or partial measurement types, either with an oboe:Entity specified, with an oboe:Characteristic specified, or without either specified, then use a an instance of schema:Observation as follows:

{
  "@context": "https://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "@type": "PropertyValue",
      "name": "sea_surface_temp",
      "description": "sea surface temperature measured in degrees Fahrenheit"
    },
    {
      "@type": "PropertyValue",
      "name": "sea_surface_temp",
      "description": "sea surface temperature measured in degrees Fahrenheit",
      "propertyID": "http://purl.obolibrary.org/obo/ENVO_04000002"
    },
    {
      "@type": "PropertyValue",
      "name": "sea_surface_temp",
      "description": "sea surface temperature measured in degrees Fahrenheit",
      "propertyID": {
        "@type": "Observation",
        "observedNode": { 
          "@id": "http://purl.obolibrary.org/obo/ENVO_01001581",
          "name": "sea surface layer"
        },
        "measuredProperty": { 
          "@id": "http://purl.obolibrary.org/obo/PATO_0000146",
          "name": "temperature"
        }
      }
    }
  ]
}

I think there is still some confusion on how to represent (2), as the example given in the SOSO example is analogous to an oboe:MeasurementType. I think it would be right if it had been to http://purl.obolibrary.org/obo/PATO_0000146, which could be conceived of as a subclass of oboe:Characteristic. More to discuss there. For our purposes, representation (1) and (3) are likely sufficient.

amoeba commented 3 years ago

Thanks @mbjones, I'm really glad to see that document has come so far. Looks like we can get a ton of metadata out of EML attributes using that pattern. I'll have a go.

From my read, it looks like we can do (1) for most records, (2) when we have semantic annotations but not annotations for both an Entity and Characteristic and (3) when we do.

mbjones commented 3 years ago

I added an alignment graphic for OBOE/SSN-EXT/schema.org in order to facilitate thinking about how we do this mapping to schema.org:

DataONEorg / slinky

support parsing EML into triples #17