KnowWhereGraph / kwg-faceted-search

Knowledge Explorer: The search interface to KnowWhereGraph
http://stko-kwg.geog.ucsb.edu
3 stars 0 forks source link

Query with elasticsearch and sparql #41

Closed seilagonzalez closed 2 years ago

seilagonzalez commented 2 years ago

Facets on climate division

SELECT DISTINCT ?facetName ?facetValue ?facetCount WHERE {
  # note empty query is allowed and will just match all documents, hence no elastic:query
  ?r a inst:kwg_index_v2_updated ;
    :facetFields "climateDivisionName" ;
    :facets _:f .
  _:f :facetName ?facetName .
  _:f :facetValue ?facetValue .
  _:f :facetCount ?facetCount .
}
seilagonzalez commented 2 years ago

cbsa code

SELECT DISTINCT ?facetName ?facetValue ?facetCount WHERE {
  # note empty query is allowed and will just match all documents, hence no elastic:query
  ?r a inst:kwg_index_v2_updated ;
    :facetFields "cbsaCode" ;
    :facets _:f .
  _:f :facetName ?facetName .
  _:f :facetValue ?facetValue .
  _:f :facetCount ?facetCount .
}
seilagonzalez commented 2 years ago

example of mixed elasticsearch with sparql Earthquakes with keyword California happening in administrativeregion 3, sort by label descendent.

SELECT DISTINCT ?entity ?place {
  ?search a inst:kwg_index_v2_updated ;
:query "California" ;
 :orderBy "-label" ;
:entities ?entity .
 ?entity a kwg-ont:EarthquakeEvent.
 ?entity kwg-ont:locatedIn ?place.
?place a kwg-ont:AdministrativeRegion_3
}
SELECT DISTINCT ?entity ?name ?t ?type ?place ?pname {
  ?search a inst:kwg_index_v2_updated ;
:query "California" ;
 :orderBy "-comment" ;
:entities ?entity .
 ?entity rdfs:label ?name.
 ?entity a kwg-ont:EarthquakeEvent.
 ?entity kwg-ont:locatedIn ?place.
  ?place rdfs:label ?pname.
    ?entity a ?type.

?place a kwg-ont:AdministrativeRegion_3
}
seilagonzalez commented 2 years ago

same query for counter:

SELECT DISTINCT( count(*) as ?counter) {
  ?search a inst:kwg_index_v2_updated ;
:query "California" ;
 :orderBy "-comment" ;
:entities ?entity .
 ?entity rdfs:label ?name.
 ?entity a kwg-ont:EarthquakeEvent.
 ?entity kwg-ont:locatedIn ?place.
  ?place rdfs:label ?pname.
    ?entity a ?type.
?place a kwg-ont:AdministrativeRegion_3
}
seilagonzalez commented 2 years ago

totalHits for climateDivisionName northwest done with elasticsearch

SELECT ?totalHits {
   ?search a inst:kwg_index_v2_updated;

       :query "climateDivisionName:northwest" ;
       :totalHits ?totalHits .
}
seilagonzalez commented 2 years ago

relation between climate division and administrated regions is through cells2 as we suspected. Store nested cells on the index and from those cells they are a type of administrated region.

seilagonzalez commented 2 years ago

Keyword search for a particular filter removing superclasses. A mixed of elasticsearch and sparql. @zilongliu-geo @amoeba @fritosxii sort by label

PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?entity ?label ?type {
  ?search a inst:kwg_index_v2_updated ;
:query "california" ;
:orderBy "-label" ;
:entities ?entity .
?entity rdfs:label ?label.
?entity a ?type.
 #filtering out superclasses 
filter not exists {
     ?super rdfs:subClassOf ?type
   }
}
seilagonzalez commented 2 years ago

@fritosxii here the query on sparql for climateDivision

PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?USClimateDivision ?label where { 
    ?USClimateDivision a kwg-ont:USClimateDivision.
    ?USClimateDivision rdfs:label ?label
} 
seilagonzalez commented 2 years ago

@fritosxii here the query for administrative regions

PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?adminRegions ?label where { 
    ?adminRegions rdfs:subClassOf kwg-ont:AdministrativeRegion.
    ?adminRegions rdfs:label ?label
} 
seilagonzalez commented 2 years ago

@fritosxii probably for now NWZones should be an input text since there are more than 3000 results, here you can check

PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ??NWZone ?label where { 
    ?NWZone a kwg-ont:NWZone.
    ?NWZone rdfs:label ?label
} 
seilagonzalez commented 2 years ago

@frito the same with ZIP Code.

seilagonzalez commented 2 years ago

@fritosxii Include also text area for locatedIn.

seilagonzalez commented 2 years ago

@fritosxii example of query where we get locatedIn from USClimateDivision

PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?entity ?label ?type {
  ?search a inst:kwg_index_v2_updated ;
:query "locatedIn:Alabama" ;
:orderBy "-label" ;
:entities ?entity .
 ?entity a ?type.
 ?entity rdfs:label ?label.
  FILTER NOT EXISTS {?super rdfs:subClassOf ?type.}

}
zilongliu-geo commented 2 years ago

Facets on climate division

SELECT DISTINCT ?facetName ?facetValue ?facetCount WHERE {
  # note empty query is allowed and will just match all documents, hence no elastic:query
  ?r a inst:kwg_index_v2_updated ;
    :facetFields "climateDivisionName" ;
    :facets _:f .
  _:f :facetName ?facetName .
  _:f :facetValue ?facetValue .
  _:f :facetCount ?facetCount .
}

There is a limit to results returned by facet queries (see the size parameter information in https://www.elastic.co/guide/en/app-search/current/facets.html). The maximum is 250 and the default is set to be 10. We should keep using SPARQL queries.

zilongliu-geo commented 2 years ago

suspected We do not need to use s2cells here. There is a kwg-ont:locatedIn relation linking a USClimateDivision and an AdministrativeRegion. image

zilongliu-geo commented 2 years ago

@seilagonzalez kwg prefix, SPARQL endpoint, place facet queries are updated and a getInstance function is added: https://github.com/KnowWhereGraph/kwg-faceted-search/commit/44ee4201ac3ecb3ca2f21632b68be06b5b45f73f.

zilongliu-geo commented 2 years ago

Expertise and Hazard facet queries are updated: https://github.com/KnowWhereGraph/kwg-faceted-search/commit/b163a535c75f6dded466323caf4cb548fd52ed6e

zilongliu-geo commented 2 years ago

Anyone know why these two queries (which are supposed to do the same job) would return different number of records?

PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>

SELECT DISTINCT ?entity ?place {
  ?search a inst:kwg_index_v2_updated ;
 :orderBy "-label" ;
:entities ?entity .
 ?entity kwg-ont:locatedIn ?place.
}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>

SELECT DISTINCT ?entity ?place {
 ?entity kwg-ont:locatedIn ?place.
}
seilagonzalez commented 2 years ago

if I run this

SELECT DISTINCT (count(distinct ?entity) as ?count){
  ?search a inst:kwg_index_v2_updated ;
  :orderBy "label" ;
  :entities ?entity .
 ?entity kwg-ont:locatedIn ?place.
    ?entity a ?type.
}

I get 8276 if I do. this one

SELECT DISTINCT (count(distinct ?entity) as ?count){
  ?search a inst:kwg_index_v2_updated ;
  :orderBy "-label" ;
  :entities ?entity .
 ?entity kwg-ont:locatedIn ?place.
    ?entity a ?type.
}

I get 1783 results. The symbol - on label needs to be researched.

According to the documentation: "Each field can be prefixed with a minus to indicate sorting in descending order."

purely sparql we get way more. Total of 12596657 results.

seilagonzalez commented 2 years ago

I don't think the index is being configure correctly.

seilagonzalez commented 2 years ago

Maybe for now we should your approach Zilong and only use it for fulltextsearch.

seilagonzalez commented 2 years ago
here is returning all entities even locatedIn Alabama. This is works great as fulltext search.
SELECT ?entity ?label {
  ?search a inst:kwg_index_v2_updated;
      :query "Alabama" ;
      :entities ?entity .
    ?entity rdfs:label ?label.
}

Labels only gives you the ones with that particular label

SELECT ?entity ?label {
  ?search a inst:kwg_index_v2_updated;
      :query "label:Alabama" ;
      :entities ?entity .
    ?entity rdfs:label ?label.
}
this one will give me just one result 
seilagonzalez commented 2 years ago

we can close this issue, I think we understand now how we are going to do it.