earthcubearchitecture-project418 / services

RPC services package for Project 418
0 stars 2 forks source link

Question about returned results from new graph/details call #12

Open ericlingerfelt opened 6 years ago

ericlingerfelt commented 6 years ago

@fils

Expected Behavior

Returns all metadata associated with URL/ID returned from search calls.

Actual Behavior

Returns only a small amount of data

Steps to reproduce this behavior

USE CASE 1: http://data.neotomadb.org/datasets/7610/ returns { "S": "http://data.neotomadb.org/datasets/7610/", "Aname": "", "Name": "", "URL": "", "Description": "", "Citation": "", "Datepublished": "", "Curl": "", "Keywords": "", "License": "" }

but the source view of this page shows

{ "@context": "http://schema.org", "@type": "Dataset", "license": "https://creativecommons.org/licenses/by/4.0/deed.en_US", "author": { "@type":"Person", "name":"" }, "includedInDataCatalog": { "@type": "DataCatalog", "about": "Paleoecology", "publisher": { "@type": "Organization", "name": "Neotoma Paleoecological Database", "alternateName":"Neotoma", "description":"The Neotoma Paleoecology Database and Community is an online hub for data, research, education, and discussion about paleoenvironments.", "url": "http://neotomadb.org" }, "funder": { "@type":"Organization", "name":"National Sciences Foundation", "alternateName": "NSF", "url": "http://nsf.gov" } }, "about": "", "distribution":{ "@type":"DataDownload", "contentUrl":"http://api.neotomadb.org/v1/data/downloads/10305", "datePublished": "2018-02-02 17:55:02", "inLanguage": "en" }, "spatialCoverage": { "@type": "Place", "name": "Carbon Monoxide [11MO593] vertebrate fauna dataset", "geo": { "@type": "GeoCoordinates", "latitude": "38.4741667", "longitude": "-90.2275", "elevation": "r" } } }

USE CASE 2:

http://get.iedadata.org/doi/322017 returns { "S": "http://get.iedadata.org/doi/322017", "Aname": "", "Name": "", "URL": "", "Description": "", "Citation": "", "Datepublished": "", "Curl": "", "Keywords": "", "License": "" }

but view source shows

{ "@context": { "@vocab": "http://schema.org/", "datacite": "http://purl.org/spar/datacite/", "earthcollab": "https://library.ucar.edu/earthcollab/schema#", "geolink": "http://schema.geolink.org/1.0/base/main#", "vivo": "http://vivoweb.org/ontology/core#", "dcat":"http://www.w3.org/ns/dcat#" }, "@id": "DOI:10.1594/IEDA/322017", "@type": "Dataset", "additionalType": [ "http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset" ], "name": "Raw Near-Bottom Oxygen Data collected by ROV Comanche from the Perth Canyon acquired during the Falkor expedition FK150301 (2015)", "citation": "McCulloch, Malcolm (2015), Raw Near-Bottom Oxygen Data collected by ROV Comanche from the Perth Canyon acquired during the Falkor expedition FK150301 (2015). Interdisciplinary Earth Data Alliance (IEDA). doi:10.1594/IEDA/322017", "creator":[{ "@type": "Person", "additionalType": "http://schema.geolink.org/1.0/base/main#Person", "name": "McCulloch, Malcolm", "givenName": "Malcolm", "familyName": "McCulloch"}], "datePublished": "2015", "dateCreated": "2015-10-07", "version": "1", "inLanguage": "en", "description": "Abstract: This data set was during Falkor expedition FK150301 conducted in 2015 (Chief Scientist: Dr. Malcolm McCulloch). These data files are of Text File (ASCII) format and include Oxygen data that have not been processed. Data were acquired as part of the project(s): Perth Canyon: First Deep Exploration.", "distribution": [ { "@type": "DataDownload", "additionalType": "http://www.w3.org/ns/dcat#distribution", "name":"DOI landing page", "http://www.w3.org/ns/dcat#accessURL": "http://dx.doi.org/10.1594/IEDA/322017", "url": "http://dx.doi.org/10.1594/IEDA/322017", "encodingFormat": "text/plain"}, { "@type": "DataDownload", "additionalType": "http://www.w3.org/ns/dcat#distribution", "name":"URL", "http://www.w3.org/ns/dcat#accessURL": "http://www.marine-geo.org/tools/search/Files.php?data_set_uid=22017", "url": "http://www.marine-geo.org/tools/search/Files.php?data_set_uid=22017" , "encodingFormat": "text/plain"} ], "identifier": [ { "@id": "doi:10.1594/IEDA/322017", "@type": "PropertyValue", "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"], "propertyID": "http://purl.org/spar/datacite/doi", "url": "http://dx.doi.org/10.1594/IEDA/322017", "value": "10.1594/IEDA/322017"} ], "keywords": ["Oxygen"], "license": "Creative Commons Attribution-NonCommercial-Share Alike 3.0 United States [CC BY-NC-SA 3.0]", "provider": { "@type": "Organization", "@id": "https://www.iedadata.org/", "name": "Interdisciplinary Earth Data Alliance (IEDA)" }, "publisher": {

    "@type": "Organization",
    "@id": "https://www.iedadata.org/",
    "name": "Interdisciplinary Earth Data Alliance (IEDA)",
    "url": "https://www.iedadata.org/",
    "description": "The IEDA data facility mission is to support, sustain, and advance the geosciences by providing data services for observational geoscience data from the Ocean, Earth, and Polar Sciences. IEDA systems serve as primary community data collections for global geochemistry and marine geoscience research and support the preservation, discovery, retrieval, and analysis of a wide range of observational field and analytical data types. Our tools and services are designed to facilitate data discovery and reuse for focused disciplinary research and to support interdisciplinary research and data integration.",
    "logo": {
        "@type": "ImageObject",
        "url": "http://app.iedadata.org/images/ieda_maplogo.png"
    },
    "contactPoint": {
        "@type": "ContactPoint",
        "name": "Information Desk",
        "email": "info@iedadata.org",
        "url": "https://www.iedadata.org/contact/",
        "contactType": "Information"
    },
    "parentOrganization": {
        "@type": "Organization",
        "@id": "https://viaf.org/viaf/142992181/",
        "name": "Lamont-Doherty Earth Observatory",
        "url": "http://www.ldeo.columbia.edu/",
        "address": {
            "@type": "PostalAddress",
            "streetAddress": "61 Route 9W",
            "addressLocality": "Palisades",
            "addressRegion": "NY",
            "postalCode": "10964-1000",
            "addressCountry": "USA"
        },
        "parentOrganization": {
            "@type": "Organization",
            "@id": "https://viaf.org/viaf/156836332/",
            "legalName": "Columbia University",
            "url": "http://www.columbia.edu/"
        } 
        }
    ,
        "funder": 
      {
        "@type": "Organization",
        "@id": "http://dx.doi.org/10.13039/100000085",
        "legalName": "Directorate for Geosciences",
        "alternateName": "NSF-GEO",
        "url": "http://www.nsf.gov",
        "parentOrganization": {
            "@type": "Organization",
            "@id": "http://dx.doi.org/10.13039/100000001",
            "legalName": "National Science Foundation",
            "alternateName": "NSF",
            "url": "http://www.nsf.gov"
        }
       }
        ,
        "publishingPrinciples": {

        "@id": "http://creativecommons.org/licenses/by-nc-sa/3.0/us/",
        "@type": "DigitalDocument",
        "additionalType": "gdx:Protocol-License",
        "name": "Dataset Usage License",
        "description": "Creative Commons Attribution-NonCommercial-Share Alike 3.0 United States [CC BY-NC-SA 3.0]",
        "url": "https://creativecommons.org/licenses/by-nc-sa/3.0/us/"

   }
fils commented 6 years ago

@ericlingerfelt @ashepherd

Adam, So the top call returns more than the bottom call.

###  GRAPH CALLS  ###
# GET call for a single resource details on X  (what is X again)  :)
GET http://geodex.org/api/v1/graph/details?r=http://opencoredata.org/id/dataset/0007e994-ba7f-4c74-b954-c7c58998d9b9

###  GRAPH CALLS  ###
# GET call for a single resource details on X  (what is X again)  :)
GET http://geodex.org/api/v1/graph/details?r=http://data.neotomadb.org/datasets/7610/

The SPARQL template is

prefix schema: <http://schema.org/>
prefix bds: <http://www.bigdata.com/rdf/search#>
select distinct ?s ?aname ?name ?url ?description ?citation ?datepublished ?curl  ?keywords ?license
where {
 VALUES ?s { <{{.}}> }.
OPTIONAL { ?s schema:alternateName ?aname } .
OPTIONAL { ?s schema:citation      ?citation }
OPTIONAL { ?s schema:datePublished ?datepublished }
OPTIONAL { ?s schema:description   ?description }
OPTIONAL { ?s schema:distribution ?distribution }
OPTIONAL { ?s schema:distribution ?distribution .
           ?distribution schema:contentUrl ?curl .
         }
OPTIONAL { ?s schema:identifier ?identifier }
OPTIONAL { ?s schema:keywords ?keywords }
OPTIONAL { ?s schema:license       ?license}
OPTIONAL { ?s schema:name         ?name}
OPTIONAL { ?s schema:url ?url}
OPTIONAL { ?s schema:measurementTechnique ?measurementtechnique }
}

The question you and I need to discuss is are we in a situation where both JSON-LD packages are valid and it's just plain hard to align a generic SPARQL query to address these variations? Or, is there something we can do to address this?

I'm wondering if this is a place where we need to address this in a more generic manner (and I have a few ideas) or if this is something we can address or normalize in the current approach.

Let's try and get some time to chat about this soon...

Doug

ashepherd commented 6 years ago

hmm, this doesn't return anything either:

DESCRIBE http://data.neotomadb.org/datasets/7610/

ericlingerfelt commented 6 years ago

@fils @ashepherd

Any news on this? I'd love to get the UI completed so we can finish this alpha testing phase and start beta testing with real users.

ericlingerfelt commented 6 years ago

@fils @ashepherd

Hi Guys,

Any news on this?

Thanks!

ashepherd commented 6 years ago

I'm working with the RDF now to see if the DESCRIBE works. It might be that they need to republish to fix their mistakes.

ashepherd commented 6 years ago

AH, it's because all their objects in the triplestore are blank nodes

ashepherd commented 6 years ago

Ew, and there's no dataset URL in their markup: URL: http://data.neotomadb.org/datasets/7610/ has schema.org of:

{
   "@context":"http://schema.org",
   "@type":"Dataset",
   "license":"https://creativecommons.org/licenses/by/4.0/deed.en_US",
   "author":{
      "@type":"Person",
      "name":""
   },
   "includedInDataCatalog":{
      "@type":"DataCatalog",
      "about":"Paleoecology",
      "publisher":{
         "@type":"Organization",
         "name":"Neotoma Paleoecological Database",
         "alternateName":"Neotoma",
         "description":"The Neotoma Paleoecology Database and Community is an online hub for data, research, education, and discussion about paleoenvironments.",
         "url":"http://neotomadb.org"
      },
      "funder":{
         "@type":"Organization",
         "name":"National Sciences Foundation",
         "alternateName":"NSF",
         "url":"http://nsf.gov"
      }
   },
   "about":"",
   "distribution":{
      "@type":"DataDownload",
      "contentUrl":"http://api.neotomadb.org/v1/data/downloads/10305",
      "datePublished":"2018-02-02 17:55:02",
      "inLanguage":"en"
   },
   "spatialCoverage":{
      "@type":"Place",
      "name":"Carbon Monoxide [11MO593] vertebrate fauna dataset",
      "geo":{
         "@type":"GeoCoordinates",
         "latitude":"38.4741667",
         "longitude":"-90.2275",
         "elevation":"`r `"
      }
   }
}
ericlingerfelt commented 6 years ago

@ashepherd @fils

Just a reminder that the ONLY once that works for me is BCO-DMO. Its not just Neotoma.

ashepherd commented 6 years ago

@ericlingerfelt , try it now.

We suspect this won't ever work for Neotoma becuase they don't have the landing page url in their data at all. We think this would be something to highlight in the final report that there needs some connection between the canonical URL and the schema.org markup to enable this. THe provider can do it by including the @id or schema:url with the same value, or the harvester can do it by keeping track of PROV for them to make this connection. This is a valuable lesson learned, and I'd vote for keeping as is for demonstration of challenges with data.

anyone disagree?