earthcube / stacIndexer

An exploration of converting STAC json catalogs into RDF
0 stars 1 forks source link

Summary issues: repeating items, keywords disappear #21

Closed ylyangtw closed 1 month ago

ylyangtw commented 2 months ago

The example search https://ecoforecast.geocodes-aws.earthcube.org/#/search/?q=water+temperature&resourceType=all

valentinedwv commented 2 months ago

It's probably not your code. When you click on the link, it goes to the same dataset urn. asl.temp.lm_Bloom_binary_mean_P1D_summaries forecasts

So the summary query is creating multiple rows for the same item, even though it should not.

https://github.com/earthcube/earthcube_utilities/issues/166

so run this query against the graph.geocodes: it should return one row.

prefix schema: <https://schema.org/>
SELECT distinct ?subj ?g ?resourceType ?name ?description  ?pubname
        (GROUP_CONCAT(DISTINCT ?placename; SEPARATOR=", ") AS ?placenames)
        (GROUP_CONCAT(DISTINCT ?kwu; SEPARATOR=", ") AS ?kw) ?datep ?sosType
        #(GROUP_CONCAT(DISTINCT ?url; SEPARATOR=", ") AS ?disurl)
        WHERE {
          graph <urn:gleaner.io:eco:vera4cast:data:8a0659c7435c14373aa7e8bd164a7c545c6ba520> {

            ?subj schema:name ?name .
             ?subj schema:description ?description .
                         values ?sosType {
               schema:Dataset
#               schema:DataCatalog
            }
            #?subj a schema:Dataset .
           #  ?subj a ?sosType .
         .
 BIND (IF (exists {?subj a schema:Dataset .} ||exists{?subj a schema:DataCatalog .} , "data", "tool") AS ?resourceType).

            optional {?subj schema:distribution/schema:url|schema:subjectOf/schema:url ?url .}
            OPTIONAL {?subj schema:datePublished ?date_p .}
            OPTIONAL {?subj schema:publisher/schema:name|schema:sdPublisher|schema:provider/schema:name ?pub_name .}
            OPTIONAL {?subj schema:spatialCoverage/schema:name ?place_name .}
            OPTIONAL {?subj schema:keywords ?kwu .}
           # Query should not return "No datePublished" is not a valid Date "YYYY-MM-DD" so
            # UI Date Select failed, because it expects an actual date
           # BIND ( IF ( BOUND(?date_p), ?date_p, "No datePublished") as ?datep ) .
             BIND ( IF ( BOUND(?date_p), ?date_p, "1900-01-01") as ?datep ) .
            BIND ( IF ( BOUND(?pub_name), ?pub_name, "No Publisher") as ?pubname ) .
            BIND ( IF ( BOUND(?place_name), ?place_name, "No spatialCoverage") as ?placename ) .
             }

        }
        GROUP BY ?subj ?pubname ?placenames ?kw ?datep   ?name ?description  ?resourceType ?sosType ?g

it does not.

so the fixed query noted in the issue returns one row.

prefix schema: <https://schema.org/>
SELECT distinct ?subj ?g ?resourceType ?name ?description  ?pubname
        (GROUP_CONCAT(DISTINCT ?placename; SEPARATOR=", ") AS ?placenames)
        (GROUP_CONCAT(DISTINCT ?kwu; SEPARATOR=", ") AS ?kw) ?datep ?sosType
        #(GROUP_CONCAT(DISTINCT ?url; SEPARATOR=", ") AS ?disurl)
        WHERE {
          graph <urn:gleaner.io:eco:vera4cast:data:8a0659c7435c14373aa7e8bd164a7c545c6ba520> {
            values ?sosType {
               schema:Dataset
 #              schema:DataCatalog
            }
            ?subj a ?sosType .
            ?subj schema:name ?name .
             ?subj schema:description ?description .

            #?subj a schema:Dataset .
           #  ?subj a ?sosType .

 BIND (IF (exists {?subj a schema:Dataset .} ||exists{?subj a schema:DataCatalog .} , "data", "tool") AS ?resourceType).

            optional {?subj schema:distribution/schema:url|schema:subjectOf/schema:url ?url .}
            OPTIONAL {?subj schema:datePublished ?date_p .}
            OPTIONAL {?subj schema:publisher/schema:name|schema:sdPublisher|schema:provider/schema:name ?pub_name .}
            OPTIONAL {?subj schema:spatialCoverage/schema:name ?place_name .}
            OPTIONAL {?subj schema:keywords ?kwu .}
           # Query should not return "No datePublished" is not a valid Date "YYYY-MM-DD" so
            # UI Date Select failed, because it expects an actual date
           # BIND ( IF ( BOUND(?date_p), ?date_p, "No datePublished") as ?datep ) .
             BIND ( IF ( BOUND(?date_p), ?date_p, "1900-01-01") as ?datep ) .
            BIND ( IF ( BOUND(?pub_name), ?pub_name, "No Publisher") as ?pubname ) .
            BIND ( IF ( BOUND(?place_name), ?place_name, "No spatialCoverage") as ?placename ) .
             }

        }
        GROUP BY ?subj ?pubname ?placenames ?kw ?datep   ?name ?description  ?resourceType ?sosType ?g

so this says, find me Values that are (for now) Datasets ANd properties of those datasets such as name..

THe bad query says: Find me names, what are the @types of those names then the properties of all those types.

ylyangtw commented 2 months ago

Updated summary sparql fixed these issue image

ylyangtw commented 1 month ago

close as it's fixed