KnowWhereGraph / kwg-faceted-search

Knowledge Explorer: The search interface to KnowWhereGraph
3 stars 0 forks source link

Query with elasticsearch and sparql #41

Closed seilagonzalez closed 2 years ago

seilagonzalez commented 2 years ago

Facets on climate division

SELECT DISTINCT ?facetName ?facetValue ?facetCount WHERE {
  # note empty query is allowed and will just match all documents, hence no elastic:query
  ?r a inst:kwg_index_v2_updated ;
    :facetFields "climateDivisionName" ;
    :facets _:f .
  _:f :facetName ?facetName .
  _:f :facetValue ?facetValue .
  _:f :facetCount ?facetCount .
seilagonzalez commented 2 years ago

cbsa code

SELECT DISTINCT ?facetName ?facetValue ?facetCount WHERE {
  # note empty query is allowed and will just match all documents, hence no elastic:query
  ?r a inst:kwg_index_v2_updated ;
    :facetFields "cbsaCode" ;
    :facets _:f .
  _:f :facetName ?facetName .
  _:f :facetValue ?facetValue .
  _:f :facetCount ?facetCount .
seilagonzalez commented 2 years ago

example of mixed elasticsearch with sparql Earthquakes with keyword California happening in administrativeregion 3, sort by label descendent.

SELECT DISTINCT ?entity ?place {
  ?search a inst:kwg_index_v2_updated ;
:query "California" ;
 :orderBy "-label" ;
:entities ?entity .
 ?entity a kwg-ont:EarthquakeEvent.
 ?entity kwg-ont:locatedIn ?place.
?place a kwg-ont:AdministrativeRegion_3
SELECT DISTINCT ?entity ?name ?t ?type ?place ?pname {
  ?search a inst:kwg_index_v2_updated ;
:query "California" ;
 :orderBy "-comment" ;
:entities ?entity .
 ?entity rdfs:label ?name.
 ?entity a kwg-ont:EarthquakeEvent.
 ?entity kwg-ont:locatedIn ?place.
  ?place rdfs:label ?pname.
    ?entity a ?type.

?place a kwg-ont:AdministrativeRegion_3
seilagonzalez commented 2 years ago

same query for counter:

SELECT DISTINCT( count(*) as ?counter) {
  ?search a inst:kwg_index_v2_updated ;
:query "California" ;
 :orderBy "-comment" ;
:entities ?entity .
 ?entity rdfs:label ?name.
 ?entity a kwg-ont:EarthquakeEvent.
 ?entity kwg-ont:locatedIn ?place.
  ?place rdfs:label ?pname.
    ?entity a ?type.
?place a kwg-ont:AdministrativeRegion_3
seilagonzalez commented 2 years ago

totalHits for climateDivisionName northwest done with elasticsearch

SELECT ?totalHits {
   ?search a inst:kwg_index_v2_updated;

       :query "climateDivisionName:northwest" ;
       :totalHits ?totalHits .
seilagonzalez commented 2 years ago

relation between climate division and administrated regions is through cells2 as we suspected. Store nested cells on the index and from those cells they are a type of administrated region.

seilagonzalez commented 2 years ago

Keyword search for a particular filter removing superclasses. A mixed of elasticsearch and sparql. @zilongliu-geo @amoeba @fritosxii sort by label

PREFIX inst: <>
PREFIX kwg-ont: <>
PREFIX rdfs: <>
SELECT DISTINCT ?entity ?label ?type {
  ?search a inst:kwg_index_v2_updated ;
:query "california" ;
:orderBy "-label" ;
:entities ?entity .
?entity rdfs:label ?label.
?entity a ?type.
 #filtering out superclasses 
filter not exists {
     ?super rdfs:subClassOf ?type
seilagonzalez commented 2 years ago

@fritosxii here the query on sparql for climateDivision

PREFIX kwg-ont: <>
PREFIX rdfs: <>
select ?USClimateDivision ?label where { 
    ?USClimateDivision a kwg-ont:USClimateDivision.
    ?USClimateDivision rdfs:label ?label
seilagonzalez commented 2 years ago

@fritosxii here the query for administrative regions

PREFIX kwg-ont: <>
PREFIX rdfs: <>
select ?adminRegions ?label where { 
    ?adminRegions rdfs:subClassOf kwg-ont:AdministrativeRegion.
    ?adminRegions rdfs:label ?label
seilagonzalez commented 2 years ago

@fritosxii probably for now NWZones should be an input text since there are more than 3000 results, here you can check

PREFIX kwg-ont: <>
PREFIX rdfs: <>
select ??NWZone ?label where { 
    ?NWZone a kwg-ont:NWZone.
    ?NWZone rdfs:label ?label
seilagonzalez commented 2 years ago

@frito the same with ZIP Code.

seilagonzalez commented 2 years ago

@fritosxii Include also text area for locatedIn.

seilagonzalez commented 2 years ago

@fritosxii example of query where we get locatedIn from USClimateDivision

PREFIX inst: <>
PREFIX kwg-ont: <>
PREFIX rdfs: <>
PREFIX owl: <>
SELECT DISTINCT ?entity ?label ?type {
  ?search a inst:kwg_index_v2_updated ;
:query "locatedIn:Alabama" ;
:orderBy "-label" ;
:entities ?entity .
 ?entity a ?type.
 ?entity rdfs:label ?label.
  FILTER NOT EXISTS {?super rdfs:subClassOf ?type.}

zilongliu-geo commented 2 years ago

Facets on climate division

SELECT DISTINCT ?facetName ?facetValue ?facetCount WHERE {
  # note empty query is allowed and will just match all documents, hence no elastic:query
  ?r a inst:kwg_index_v2_updated ;
    :facetFields "climateDivisionName" ;
    :facets _:f .
  _:f :facetName ?facetName .
  _:f :facetValue ?facetValue .
  _:f :facetCount ?facetCount .

There is a limit to results returned by facet queries (see the size parameter information in The maximum is 250 and the default is set to be 10. We should keep using SPARQL queries.

zilongliu-geo commented 2 years ago

suspected We do not need to use s2cells here. There is a kwg-ont:locatedIn relation linking a USClimateDivision and an AdministrativeRegion. image

zilongliu-geo commented 2 years ago

@seilagonzalez kwg prefix, SPARQL endpoint, place facet queries are updated and a getInstance function is added:

zilongliu-geo commented 2 years ago

Expertise and Hazard facet queries are updated:

zilongliu-geo commented 2 years ago

Anyone know why these two queries (which are supposed to do the same job) would return different number of records?

PREFIX inst: <>
PREFIX rdfs: <>
PREFIX kwg-ont: <>

SELECT DISTINCT ?entity ?place {
  ?search a inst:kwg_index_v2_updated ;
 :orderBy "-label" ;
:entities ?entity .
 ?entity kwg-ont:locatedIn ?place.
PREFIX inst: <>
PREFIX rdfs: <>
PREFIX kwg-ont: <>

SELECT DISTINCT ?entity ?place {
 ?entity kwg-ont:locatedIn ?place.
seilagonzalez commented 2 years ago

if I run this

SELECT DISTINCT (count(distinct ?entity) as ?count){
  ?search a inst:kwg_index_v2_updated ;
  :orderBy "label" ;
  :entities ?entity .
 ?entity kwg-ont:locatedIn ?place.
    ?entity a ?type.

I get 8276 if I do. this one

SELECT DISTINCT (count(distinct ?entity) as ?count){
  ?search a inst:kwg_index_v2_updated ;
  :orderBy "-label" ;
  :entities ?entity .
 ?entity kwg-ont:locatedIn ?place.
    ?entity a ?type.

I get 1783 results. The symbol - on label needs to be researched.

According to the documentation: "Each field can be prefixed with a minus to indicate sorting in descending order."

purely sparql we get way more. Total of 12596657 results.

seilagonzalez commented 2 years ago

I don't think the index is being configure correctly.

seilagonzalez commented 2 years ago

Maybe for now we should your approach Zilong and only use it for fulltextsearch.

seilagonzalez commented 2 years ago
here is returning all entities even locatedIn Alabama. This is works great as fulltext search.
SELECT ?entity ?label {
  ?search a inst:kwg_index_v2_updated;
      :query "Alabama" ;
      :entities ?entity .
    ?entity rdfs:label ?label.

Labels only gives you the ones with that particular label

SELECT ?entity ?label {
  ?search a inst:kwg_index_v2_updated;
      :query "label:Alabama" ;
      :entities ?entity .
    ?entity rdfs:label ?label.
this one will give me just one result 
seilagonzalez commented 2 years ago

we can close this issue, I think we understand now how we are going to do it.