binary-array-ld / bald

Python library for validating and managing binary array linked data files, e.g. HDF, netCDF.
BSD 3-Clause "New" or "Revised" License
11 stars 12 forks source link

export bald as schema.org #80

Open jyucsiro opened 6 years ago

jyucsiro commented 6 years ago

BALD lib provides the ability to take nc files and obtain an RDF graph in BALD syntax/formalisms.

This provides a pathway to a schema.org profile. It would be good to have an export bald to schema.org function.

jyucsiro commented 6 years ago

I've started pulling a mapping table between bald and schema.org in the wiki: https://github.com/binary-array-ld/bald/wiki/Schema.org-mappings

Schema.org only has 2 required fields: title and name. However, there are a range of other 'recommended' properties.

From a first pass, it seems a lot of the predicates may be able to be reused from ACDD, one or two from CF. See https://github.com/binary-array-ld/bald/wiki/Schema.org-mappings

adamml commented 6 years ago

Thanks Jonathan.

I'm at the International Marine Data and Information Systems conference this week, but will try and fill some of this out in any down time.

That also means I'm unlikely to be on the call later.

On Mon 5 Nov 2018, 03:26 Jonathan Yu <notifications@github.com wrote:

I've started pulling a mapping table between bald and schema.org in the wiki: https://github.com/binary-array-ld/bald/wiki/Schema.org-mappings

Schema.org only has 2 required fields: title and name. However, there are a range of other 'recommended' properties.

From a first pass, it seems a lot of the predicates may be able to be reused from ACDD, one or two from CF. See https://github.com/binary-array-ld/bald/wiki/Schema.org-mappings

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/binary-array-ld/bald/issues/80#issuecomment-435736528, or mute the thread https://github.com/notifications/unsubscribe-auth/ADNjW1zvhWWFZlbyUSl0CeeItipmW2S_ks5ur6G6gaJpZM4YNqq4 .

jyucsiro commented 6 years ago

@adamml @marqh - i've had a go at writing some code to output schema.org JSON-LD (first cut) as a extension to nc2rdf.py.

This uses the mapping file bald2schemaorg_mappings.json, which implements a subset of https://github.com/binary-array-ld/bald/wiki/Schema.org-mappings

A user could run this to get a schema.org description from netCDF file or CDL: $ python nc2rdf.py -o json-ld --schema-org ../lib/bald/tests/integration/CDL/trajectoryProfile_template.cdl

See #82

Table this to discuss at the next telecon...

jyucsiro commented 6 years ago

The output of that example:

{
    "@context": "http://schema.org/",
    "description": "This is an example of the Oceanographic and surface meteorological data collected from the Underwater Slocum Glider RU07 by the National Centers for Environmental Information (NCEI) in the Cordell Bank National Marine Sanctuary from 2015-03-25 to 2015-03-25. The data contained within this file are completely bogus and are generated using the python module numpy.random.rand() function. This file can be used for testing with various applications. The uuid was generated using the uuid python module, invoking the command uuid.uuid4().",
    "http://schema.org/identifier": "NCEI_trajectoryProfile_template_v2.0_2016-09-22_181838.014029.nc",
    "http://schema.org/license": "Freely available",
    "id": "_:N9146fcda1487406b9b977fb4d6641a0f",
    "keywords": "Oceans > Ocean Temperature > Water Temperature, Oceans > Salinity/Density > Salinity",
    "name": "Oceanographic and surface meteorological data collected from the Underwater Slocum Glider RU07 by the National Centers for Environmental Information (NCEI) in the Cordell Bank National Marine Sanctuary from 2015-03-25 to 2015-03-25",
    "type": "Dataset"
}
adamml commented 6 years ago

I left a comment on #82...

jyucsiro commented 6 years ago

revised example - now with split up keywords and added 'url' if coming from http/https url...

{   
    "@context": "http://schema.org/",
    "description": "The Extended AVHRR Polar Pathfinder (APP-x) version-2 Thematic Climate Data Record (CDR) includes surface temperature, surface albedo, surface and the Top Of the Atmosphere (TOA) shortwave and longwave radiative fluxes, cloud properties (amount, phase, particle size, optical depth,top pressure and temperature, surface and TOA radiative effect), and ice thickness and age. The APP-x CDR has twice daily data at local solar times of 14 and 04(02) for the Arctic(Antarctic) at a spatial resolution of 25 km for both poles.",
    "http://schema.org/identifier": "Polar-APP-X_v01r01_Nhem_1400_d20160801_c20160803.nc",
    "http://schema.org/license": "No restrictions on access or use",
    "id": "_:Na31c8fae6e6e48b986e7d063cd474a28",
    "keywords": [
        "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > RADIATIVE FLUX",
        "EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SNOW and ICE > ALBEDO",
        "EARTH SCIENCE > CRYOSPHERE > SEA ICE > ICE DEPTH(THICKNESS)",
        "EARTH SCIENCE > CRYOSPHERE > SNOW and ICE > SNOW and ICE TEMPERATURE",
        "EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SNOW and ICE > ICE DEPTH(THICKNESS)",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP PRESSURE",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TYPE",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD DROPLET PHASE",
        "EARTH SCIENCE > CLIMATE INDICATORS > CRYOSPHERIC INDICATORS > ICE DEPTH(THICKNESS)",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP TEMPERATURE",
        "EARTH SCIENCE > LAND SURFACE > LAND ALBEDO",
        "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > LONGWAVE RADIATION",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD DROPLET CONCENTRATION(SIZE)",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD RADIATIVE TRANSFER > CLOUD RADIATIVE FORCING",
        "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > SOLAR RADIATION",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD FRACTION",
        "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD OPTICAL DEPTH(THICKNESS)",
        "EARTH SCIENCE > CRYOSPHERE > SNOW and ICE > ALBEDO",
        "EARTH SCIENCE > LAND SURFACE > LAND TEMPERATURE",
        "EARTH SCIENCE > OCEANS > SEA ICE > ICE DEPTH(THICKNESS)"
    ],
    "name": "Extended AVHRR Polar Pathfinder Fundamental Climate Data Record (APPx CDR)",
    "type": "Dataset",
    "url": "https://www.ngdc.noaa.gov/thredds/dodsC/arctic/Polar-APP-X_v01r01_Nhem_1400_d20160801_c20160803.nc"
}
jyucsiro commented 6 years ago

Been experimenting with landing pages generated from these schema.org descriptions from a thredds harvest process.

using nc files listed in any thredds/opendap catalog... e.g. --> https://www.ngdc.noaa.gov/thredds/catalog/arctic/catalog.xml --> schema.org json --> bald-server (see below)

See the deployed flask app: http://waterinformatics-ext1-cdc.it.csiro.au/bald-server/

So in a view page, the schema.org is rendered, but also the json-ld is embedded in the . This provides a pathway for thredds/opendap hosted nc files to have landing pages for discovery by say, Google Dataset search...

prototype bald-server here: https://github.com/jyucsiro/bald-server

adamml commented 6 years ago

Very nice.

The Sructured Data Testing Tool from Google happily parses the JSON-LD...

Attribute Value
@type Dataset
description The Extended AVHRR Polar Pathfinder (APP-x) version-2 Thematic Climate Data Record (CDR) includes surface temperature, surface albedo, surface and the Top Of the Atmosphere (TOA) shortwave and longwave radiative fluxes, cloud properties (amount, phase, particle size, optical depth,top pressure and temperature, surface and TOA radiative effect), and ice thickness and age. The APP-x CDR has twice daily data at local solar times of 14 and 04(02) for the Arctic(Antarctic) at a spatial resolution of 25 km for both poles.
license No restrictions on access or use
keywords EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SNOW and ICE > ALBEDO
keywords EARTH SCIENCE > LAND SURFACE > LAND TEMPERATURE
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD DROPLET PHASE
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP PRESSURE
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD FRACTION
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD OPTICAL DEPTH(THICKNESS)
keywords EARTH SCIENCE > OCEANS > SEA ICE > ICE DEPTH(THICKNESS)
keywords EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > SOLAR RADIATION
keywords EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SNOW and ICE > ICE DEPTH(THICKNESS)
keywords EARTH SCIENCE > CRYOSPHERE > SNOW and ICE > SNOW and ICE TEMPERATURE
keywords EARTH SCIENCE > CRYOSPHERE > SEA ICE > ICE DEPTH(THICKNESS)
keywords EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > LONGWAVE RADIATION
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TYPE
keywords EARTH SCIENCE > CLIMATE INDICATORS > CRYOSPHERIC INDICATORS > ICE DEPTH(THICKNESS)
keywords EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > RADIATIVE FLUX
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP TEMPERATURE
keywords EARTH SCIENCE > CRYOSPHERE > SNOW and ICE > ALBEDO
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD DROPLET CONCENTRATION(SIZE)
keywords EARTH SCIENCE > LAND SURFACE > LAND ALBEDO
keywords EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD RADIATIVE TRANSFER > CLOUD RADIATIVE FORCING
name Extended AVHRR Polar Pathfinder Fundamental Climate Data Record (APPx CDR)
url https://www.ngdc.noaa.gov/thredds/dodsC/arctic/Polar-APP-X_v01r01_Nhem_1400_d20160801_c20160803.nc
ddlenz commented 3 years ago

Hi, I'm wondering if this feature might still be pending? I get the following using the attached file:

python nc2rdf.py /Users/dlenz/Downloads/foo.nc --schema-org
Traceback (most recent call last):
  File "/Users/dlenz/git/bald/nc2rdf/nc2rdf.py", line 153, in <module>
    nc2schemaorg(args.ncfile, args.format, baseuri=args.baseuri)
  File "/Users/dlenz/git/bald/nc2rdf/nc2rdf.py", line 100, in nc2schemaorg
    schema_g = baldgraph2schemaorg(graph, path=ncfilename, baseuri=baseuri)
  File "/Users/dlenz/git/bald/nc2rdf/nc2rdf.py", line 93, in baldgraph2schemaorg
    schema_org_inst  =  bald.schemaOrg()
TypeError: __init__() missing 1 required positional argument: 'graph'

foo.nc.zip