WolfgangFahl / pyCEURmake

CEUR make python implementation
Apache License 2.0
2 stars 1 forks source link

Named Query "Proceedings" too slow and incompatible with QLever #45

Open WolfgangFahl opened 1 year ago

WolfgangFahl commented 1 year ago
    PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
    PREFIX p: <http://www.wikidata.org/prop/>
    PREFIX schema: <http://schema.org/>
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX wikibase: <http://wikiba.se/ontology#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT 
      ?item 
      ?itemLabel 
      ?itemDescription 
      ?ceurwspart 
      ?sVolume 
      ?Volume 
      ?short_name 
      ?dblpProceedingsId 
      ?ppnId 
      ?event 
      ?eventLabel 
      ?dblpEventId 
      ?eventSeries 
      ?eventSeriesLabel 
      ?eventSeriesOrdinal 
      ?title 
      ?language_of_work_or_name 
      ?language_of_work_or_nameLabel 
      ?URN_NBN 
      ?publication_date 
      ?fullWorkUrl 
      ?described_at_URL 
      ?homePage 
    WHERE {
      ?item wdt:P31 wd:Q1143604;
        wdt:P179 wd:Q27230297;
        rdfs:label ?itemLabel.
      FILTER((LANG(?itemLabel)) = "en")
      OPTIONAL {
        ?item schema:description ?itemDescription.
        FILTER((LANG(?itemDescription)) = "en")
      }
      OPTIONAL { ?item wdt:P478 ?Volume. }
      OPTIONAL { ?item (p:P179/pq:P478) ?_sVolume. BIND(xsd:integer(?_sVolume) as ?sVolume)}
      OPTIONAL { ?item wdt:P1813 ?short_name. }
      OPTIONAL { ?item wdt:P8978 ?dblpProceedingsId. }
      OPTIONAL { ?item wdt:P6721 ?ppnId. }
      OPTIONAL {?item wdt:P4109 ?URN_NBN.}        
      OPTIONAL { ?item wdt:P1476 ?title. }
      OPTIONAL { ?item wdt:P577 ?publication_date. }
      OPTIONAL { ?item wdt:P953 ?fullWorkUrl. }
      OPTIONAL { ?item wdt:P973 ?described_at_URL. }
      OPTIONAL { ?item wdt:P856 ?homePage. }
      OPTIONAL {
        ?item wdt:P407 ?language_of_work_or_name.
        ?language_of_work_or_name rdfs:label ?language_of_work_or_nameLabel.
        FILTER((LANG(?language_of_work_or_nameLabel)) = "en")
      }
      {
        SELECT 
          ?item 
          (GROUP_CONCAT(?_event; SEPARATOR = "|") AS ?event) 
          (GROUP_CONCAT(?_eventLabel; SEPARATOR = "|") AS ?eventLabel) 
          (GROUP_CONCAT(?_eventSeries; SEPARATOR = "|") AS ?eventSeries) 
          (GROUP_CONCAT(?_eventSeriesLabel; SEPARATOR = "|") AS ?eventSeriesLabel) 
          (GROUP_CONCAT(?_eventSeriesOrdinal; SEPARATOR = "|") AS ?eventSeriesOrdinal)
          (GROUP_CONCAT(?_dblpEventId; SEPARATOR = "|") AS ?dblpEventId) 
        WHERE {
          ?item wdt:P31 wd:Q1143604;
            wdt:P179 wd:Q27230297;
            wdt:P4745 ?_event.
          ?_event rdfs:label ?_eventLabel.
          FILTER((LANG(?_eventLabel)) = "en")
          OPTIONAL { ?_event wdt:P10692 ?_dblpEventId. }
          OPTIONAL {
            ?_event p:P179 ?_partOfTheEventSeriesStmt.
            ?_partOfTheEventSeriesStmt ps:P179 ?_eventSeries;
              pq:P1545 ?_eventSeriesOrdinal.
            ?_eventSeries rdfs:label ?_eventSeriesLabel.
            FILTER((LANG(?_eventSeriesLabel)) = "en")
          }
        }
        GROUP BY ?item
      }
    }
    ORDER BY ?sVolume
WolfgangFahl commented 1 year ago

More often than not the above query times out on the wikidata query service. It also doesn't work in the Qlever environment. see #42

WolfgangFahl commented 1 year ago

Event details may be queried separately:

SELECT 
?item 
(GROUP_CONCAT(?_event; SEPARATOR = "|") AS ?event) 
(GROUP_CONCAT(?_eventLabel; SEPARATOR = "|") AS ?eventLabel) 
(GROUP_CONCAT(?_eventSeries; SEPARATOR = "|") AS ?eventSeries) 
(GROUP_CONCAT(?_eventSeriesLabel; SEPARATOR = "|") AS ?eventSeriesLabel) 
(GROUP_CONCAT(?_eventSeriesOrdinal; SEPARATOR = "|") AS ?eventSeriesOrdinal)
(GROUP_CONCAT(?_dblpEventId; SEPARATOR = "|") AS ?dblpEventId) 
WHERE {
  VALUES ?item {
    wd:Q107266045
  }  
  ?item  wdt:P4745 ?_event.
  ?_event rdfs:label ?_eventLabel.
  FILTER((LANG(?_eventLabel)) = "en")
  OPTIONAL { ?_event wdt:P10692 ?_dblpEventId. }
  OPTIONAL {
    ?_event p:P179 ?_partOfTheEventSeriesStmt.
    ?_partOfTheEventSeriesStmt ps:P179 ?_eventSeries;
                               pq:P1545 ?_eventSeriesOrdinal.
    ?_eventSeries rdfs:label ?_eventSeriesLabel.
    FILTER((LANG(?_eventSeriesLabel)) = "en")
  }
}
GROUP BY ?item
tholzheim commented 1 year ago

The part that is incompatible to QLever is the casting to integer for the sorting of the result

xsd:integer(?sVolume) is not supported by QLever

tholzheim commented 1 year ago

Also adding DISTINCT to the sub-query improves the execution time

WolfgangFahl commented 1 year ago

Please create an issue with QLever for

xsd:integer(?sVolume) is not supported by QLever

upstream

WolfgangFahl commented 1 year ago

We need a two-phase query implementation now.

tholzheim commented 1 year ago

The query is already two-phased see

which uses the queries

tholzheim commented 1 year ago

Please create an issue with QLever for

xsd:integer(?sVolume) is not supported by QLever

upstream

see https://github.com/ad-freiburg/qlever/issues/853

VladimirAlexiev commented 1 year ago

@WolfgangFahl I think the main problem with the query is that when you use OPTIONAL with multi-valued fields, that causes Cartesian product (explosion). If the variables have N1, N2, N3 values then the result set contains N1N2N3 rows.

Use UNION instead of OPTIONAL