Open WolfgangFahl opened 5 months ago
Potential test platform https://scholia.portal.mardi4nfdi.de/
Further example Queries:
see also #2063 and https://github.com/ad-freiburg/qlever/issues/859
I am trying to understand this. Do you propose the use of FROM
against and special endpoint that distribute queries? Where does federation comes in?
@fnielsen the intention is to do information hiding and don't reveal what the actual query looks like. Take author_events as an example that query has the name "author_events" and a single QID parameter.
The specific query for your personal QID Q20980928 on QLever would e.g. https://qlever.cs.uni-freiburg.de/wikidata/084HGc and doesnot run out of the box The query on Wikidata does give 71 results but the URL shortening fails so i can't give a short link here and purposely i don't intend to show the details of the query. you'd just be interested in the result.
our pyLodStorage library already allows commands such as:
sparqlquery -qp wikidata.yaml -qn author_events_fan -f github
which will pick up the query specification from a yaml file with author_events_fan - named query spec(see result below). The proposal here is to offer the same behavior as a SPARL endpoint compatible web service that hides all technical details. That way if a query needs to be rewritten to a federated query we may do so "behind the science" in the blackbox we are providing. We might even check whether the result is the same as without the federation.
date | event | eventLabel | eventUrl | roles | locations |
---|---|---|---|---|---|
2023-09-13 | http://www.wikidata.org/entity/Q117314306 | First Wikibase Lexical Data Workshop | /event/Q117314306 | speaker | Centre for Translation Studies |
2023-05-28 | http://www.wikidata.org/entity/Q115781177 | ESWC 2023 | /event/Q115781177 | participant | Aldemar Knossos Royal |
2023-05-28 | http://www.wikidata.org/entity/Q115972632 | Semantic Technologies for Scientific, Technical and Legal Data | /event/Q115972632 | speaker, author | Aldemar Knossos Royal |
2023-05-28 | http://www.wikidata.org/entity/Q121334813 | ESWC 2023 Workshops and Tutorials | /event/Q121334813 | author | Chersonesos |
2023-05-22 | http://www.wikidata.org/entity/Q115497966 | The 24th Nordic Conference on Computational Linguistics | /event/Q115497966 | author | Tórshavn |
2023-05-11 | http://www.wikidata.org/entity/Q114794722 | Wiki Workshop 2023 | /event/Q114794722 | author | |
2022-11-30 | http://www.wikidata.org/entity/Q113956029 | Sprogteknologisk Konference 2022 | /event/Q113956029 | participant | Søndre Campus |
2022-11-07 | http://www.wikidata.org/entity/Q113954954 | Danish Data Science 2022 | /event/Q113954954 | participant | Hotel LEGOLAND |
2021-11-16 | http://www.wikidata.org/entity/Q108377974 | Sprogteknologisk Konference 2021 | /event/Q108377974 | participant | Søndre Campus |
2021-10-25 | http://www.wikidata.org/entity/Q106591764 | Deep Learning for Knowledge Graphs 2021 | /event/Q106591764 | program committee member | |
2021-10-24 | http://www.wikidata.org/entity/Q106429029 | The 2nd Wikidata Workshop | /event/Q106429029 | program committee member | |
2021-05-31 | http://www.wikidata.org/entity/Q102274071 | The 23rd Nordic Conference on Computational Linguistics | /event/Q102274071 | author | Reykjavík University |
2021-04-14 | http://www.wikidata.org/entity/Q104835330 | Wiki Workshop 2021 | /event/Q104835330 | participant | |
2020-11-02 | http://www.wikidata.org/entity/Q86530254 | The 1st Wikidata Workshop | /event/Q86530254 | program committee member | |
2020-10-26 | http://www.wikidata.org/entity/Q100741900 | WikiCite 2020 Virtual conference | /event/Q100741900 | speaker, participant | online |
2020-10-19 | http://www.wikidata.org/entity/Q98083516 | Combining Symbolic and Sub-symbolic methods and their Applications | /event/Q98083516 | program committee member | Galway |
2020-09-01 | http://www.wikidata.org/entity/Q102070516 | Digitally support Environment Assessment for Sustainable Development Goals | /event/Q102070516 | participant | |
2020-06-22 | http://www.wikidata.org/entity/Q79137947 | 7th Workshop on Linked Data in Linguistics | /event/Q79137947 | author | |
2020-06-01 | http://www.wikidata.org/entity/Q84430072 | 3rd Workshop on Quality of Open Data | /event/Q84430072 | program committee member | University of Colorado, at Colorado Springs |
2020-05-31 | http://www.wikidata.org/entity/Q83793571 | Deep Learning for Knowledge Graphs 2020 | /event/Q83793571 | program committee member | Chersonesos |
2020-05-26 | http://www.wikidata.org/entity/Q94759294 | WikiLunch | /event/Q94759294 | participant | German National Library of Science and Technology, World Wide Web, Wikiversity |
2020-05-26 | http://www.wikidata.org/entity/Q94495218 | #vBIB20 | /event/Q94495218 | speaker | German National Library of Science and Technology, World Wide Web |
2019-10-25 | http://www.wikidata.org/entity/Q42449814 | WikidataCon 2019 | /event/Q42449814 | speaker | Urania |
2019-10-09 | http://www.wikidata.org/entity/Q63686495 | Conference on Natural Language Processing 2019 | /event/Q63686495 | author | Kollegienhaus |
2019-09-09 | http://www.wikidata.org/entity/Q59917009 | SEMANTiCS 2019 | /event/Q59917009 | participant, author | Karlsruhe |
2019-08-01 | http://www.wikidata.org/entity/Q48010913 | Wikimania 2019 | /event/Q48010913 | speaker | Stockholm University |
2019-07-23 | http://www.wikidata.org/entity/Q61983755 | The 10th Global WordNet Conference | /event/Q61983755 | participant, author | Wrocław University of Science and Technology |
2019-06-26 | http://www.wikidata.org/entity/Q61141551 | 2nd Workshop on Quality of Open Data | /event/Q61141551 | program committee member | Seville |
2019-06-17 | http://www.wikidata.org/entity/Q59979937 | 5th International Conference on Computational Social Science | /event/Q59979937 | program committee member | University of Amsterdam |
2019-06-02 | http://www.wikidata.org/entity/Q60808888 | Workshop at ESWC 2019 on Deep Learning for Knowledge Graphs | /event/Q60808888 | program committee member | Grand Hotel Bernardin |
2019-06-02 | http://www.wikidata.org/entity/Q59620529 | ESWC 2019 | /event/Q59620529 | participant, author | Grand Hotel Bernardin |
2019-05-17 | http://www.wikidata.org/entity/Q44062313 | Wikimedia Hackathon 2019 | /event/Q44062313 | participant | National Library of Technology building |
2019-04-16 | http://www.wikidata.org/entity/Q63171054 | Women in Data Science Conference 2019 Copenhagen | /event/Q63171054 | participant | IT University of Copenhagen |
2019-03-29 | http://www.wikidata.org/entity/Q59848782 | Wikimedia Summit 2019 | /event/Q59848782 | participant | Mercure Hotel Berlin Tempelhof Airport |
2018-11-27 | http://www.wikidata.org/entity/Q55117737 | WikiCite 2018 | /event/Q55117737 | speaker, participant | David Brower Center |
2018-11-06 | http://www.wikidata.org/entity/Q55910942 | Second Linked Open Citation Database Workshop | /event/Q55910942 | speaker | Mannheim Palace |
2018-10-03 | http://www.wikidata.org/entity/Q56876300 | Research Output & Impact Analyzed and Visualized: Concluding Conference | /event/Q56876300 | speaker | DGI-byen |
2018-09-25 | http://www.wikidata.org/entity/Q48563023 | 10th International Conference on Social Informatics | /event/Q48563023 | program committee member | St. Petersburg |
2018-09-03 | http://www.wikidata.org/entity/Q51955163 | Workshop on Open Citations | /event/Q51955163 | speaker | University of Bologna |
2018-07-20 | http://www.wikidata.org/entity/Q48548111 | 1st Workshop on Quality of Open Data | /event/Q48548111 | program committee member | Berlin |
2018-07-12 | http://www.wikidata.org/entity/Q47482917 | 4th Annual International Conference on Computational Social Science | /event/Q47482917 | program committee member | Kellogg School of Management |
2018-06-04 | http://www.wikidata.org/entity/Q48621961 | 1st International Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies | /event/Q48621961 | participant, author | Aldemar Knossos Royal |
2018-06-03 | http://www.wikidata.org/entity/Q54496448 | 3rd International Workshop on Geospatial Linked Data | /event/Q54496448 | participant, author | Aldemar Knossos Royal |
2018-06-03 | http://www.wikidata.org/entity/Q50290385 | ESWC 2018 | /event/Q50290385 | participant | Aldemar Knossos Royal |
2018-05-27 | http://www.wikidata.org/entity/Q47501229 | 11th International Conference on Chemical Structures | /event/Q47501229 | author | Noordwijkerhout |
2018-05-01 | http://www.wikidata.org/entity/Q30087264 | Wikimedia Hackathon 2018 | /event/Q30087264 | participant | Bellaterra Campus |
2018-04-24 | http://www.wikidata.org/entity/Q47035167 | Wiki Workshop 2018 | /event/Q47035167 | participant, author | Palais des congrès de Lyon |
2018-04-23 | http://www.wikidata.org/entity/Q48910401 | The Web Conference 2018 | /event/Q48910401 | participant, author | Palais des congrès de Lyon |
2018-04-20 | http://www.wikidata.org/entity/Q50132215 | Wikimedia Conference 2018 | /event/Q50132215 | participant | Mercure Hotel Berlin Tempelhof Airport |
2018-01-09 | http://www.wikidata.org/entity/Q64864052 | Teaching platform for developing and automatically tracking early stage literacy skill | /event/Q64864052 | participant | |
2017-11-17 | http://www.wikidata.org/entity/Q43254255 | 8th Language & Technology Conference | /event/Q43254255 | speaker, participant, author | Poznań |
2017-10-28 | http://www.wikidata.org/entity/Q37807682 | WikidataCon 2017 | /event/Q37807682 | speaker, participant | Tagesspiegel building |
2017-09-13 | http://www.wikidata.org/entity/Q48612170 | 9th International Conference on Social Informatics | /event/Q48612170 | program committee member | Wolfson College |
2017-09-07 | http://www.wikidata.org/entity/Q28052808 | 2017 Conference on Empirical Methods in Natural Language Processing | /event/Q28052808 | participant | Øksnehallen, DGI-byen, Copenhagen |
2017-05-28 | http://www.wikidata.org/entity/Q30090453 | ESWC 2017 | /event/Q30090453 | participant, author | Portorož |
2017-05-28 | http://www.wikidata.org/entity/Q113625218 | 1st International Workshop on Scientometrics | /event/Q113625218 | author | Portorož |
2017-05-28 | http://www.wikidata.org/entity/Q113744888 | 1st International Workshop on Enabling Decentralised Scholarly Communication | /event/Q113744888 | author | Portorož |
2017-05-19 | http://www.wikidata.org/entity/Q28053831 | Wikimedia Hackathon 2017 | /event/Q28053831 | participant | JUFA Wien City |
2017-03-31 | http://www.wikidata.org/entity/Q29169189 | Wikimedia Conference 2017 | /event/Q29169189 | participant | |
2017-01-01 | http://www.wikidata.org/entity/Q54856362 | WikiCite 2017 | /event/Q54856362 | participant | Vienna |
2016-06-16 | http://www.wikidata.org/entity/Q24632656 | The People's Meeting 2016 | /event/Q24632656 | participant | Allinge |
2016-05-17 | http://www.wikidata.org/entity/Q75540679 | Wiki Workshop 2016, ICWSM 2016 | /event/Q75540679 | author | Cologne |
2014-01-01 | http://www.wikidata.org/entity/Q14506843 | Wikimania 2014 | /event/Q14506843 | participant | Barbican Centre |
2012-05-28 | http://www.wikidata.org/entity/Q113505637 | 2nd Workshop on Semantic Publishing | /event/Q113505637 | author | Chersonesos |
2012-05-27 | http://www.wikidata.org/entity/Q42431329 | ESWC 2012 | /event/Q42431329 | author | Aldemar Knossos Royal |
2011-05-30 | http://www.wikidata.org/entity/Q113659299 | ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages | /event/Q113659299 | author | Heraklion |
2010-01-01 | http://www.wikidata.org/entity/Q14507062 | Wikimania 2010 | /event/Q14507062 | participant | Gdańsk |
2008-01-01 | http://www.wikidata.org/entity/Q11756041 | Wikimania 2008 | /event/Q11756041 | participant | Alexandria |
2004-12-13 | http://www.wikidata.org/entity/Q73025763 | Neural Information Processing Systems 2004 | /event/Q73025763 | author | Whistler, Vancouver |
2000-06-05 | http://www.wikidata.org/entity/Q75936725 | ICASSP 2000 | /event/Q75936725 | author | Istanbul |
http://www.wikidata.org/entity/Q114647284 | Wikidata WikiProject COVID-19 | /event/Q114647284 | participant |
We have been hard at work on our Graph Split experiment [1], and we now have a working graph split that is loaded onto 3 test servers. We are running tests on a selection of queries from our logs to help understand the impact of the split. We need your help to validate the impact of various use cases and workflows around Wikidata Query Service.
What is the WDQS Graph Split experiment?
We want to address the growing size of the Wikidata graph by splitting it into 2 subgraphs of roughly half the size of the full graph, which should support the growth of Wikidata for the next 5 years. This experiment is about splitting the full Wikidata graph into a scholarly articles subgraph and a “main” graph that contains everything else.
See our previous update for more details [2].
Who should care?
Anyone who uses WDQS through the UI or programmatically should check the impact on their use cases, scripts, bots, code, etc.
What are those test endpoints?
We expose 3 test endpoints, for the full, main and scholarly articles graphs. Those graphs are all created from the same dump and are not live updated. This allows us to compare queries between the different endpoints, with stable / non changing data (the data are from the middle of October 2023).
The endpoints are:
Each of the endpoints is backed by a single dedicated server of performance similar to the production WDQS servers. We don’t expect performance to be representative of production due to the different load and to the lack of updates on the test servers.
What kind of feedback is useful?
We expect queries that don’t require scholarly articles to work transparently on the “main” subgraph. We expect queries that require scholarly articles to need to be rewritten with SPARQL federation between the “main” and scholarly subgraphs (federation is supported for some external SPARQL servers already [3], this just happens to be for internal server-to-server communication). We are doing tests and analysis based on a sample of query logs.
We want to hear about:
General use cases or classes of queries which break under federation Bots or applications that need significant rewrite of queries to work with federation And also about use cases that work just fine!
Examples of queries and pointers to code will be helpful in your feedback.
Where should feedback be sent?
You can reach out to us using the project’s talk page [1], the Phabricator ticket for community feedback [4] or by pinging directly Sannita (WMF) [5].
Will feedback be taken into account?
Yes! We will review feedback and it will influence our path forward. That being said, there are limits to what is possible. The size of the Wikidata graph is a threat to the stability of WDQS and thus a threat to the whole Wikidata project. Scholarly articles is the only split we know of that would reduce the graph size sufficiently. We can work together on providing support for a migration, on reviewing the rules used for the graph split, but we can’t just ignore the problem and continue with a WDQS that provides transparent access to the full Wikidata graph.
Have fun!
Guillaume
Guillaume Lederrey (he/him) Engineering Manager Wikimedia Foundation
There is now a Wikimedia Hackathon 2024 project task for this https://phabricator.wikimedia.org/T363894
Check out http://snapquery.bitplan.com/query/scholia/author_list-of-publications with Q80 - Tim Berners-Lee to get
http://snapquery.bitplan.com has the demo and project is at https://github.com/WolfgangFahl/snapquery with further links to the Hackathon results - thanks to Tim and Dennis for making this happen!
Check out http://snapquery.bitplan.com/query/scholia/author_list-of-publications with Q80 - Tim Berners-Lee to get
http://snapquery.bitplan.com has the demo and project is at https://github.com/WolfgangFahl/snapquery with further links to the Hackathon results - thanks to Tim and Dennis for making this happen!
I get TimeoutError: No connection after 3.0 seconds
@fnielsen there is another server at https://snapquery.wikidata.dbis.rwth-aachen.de/query/scholia/author_list-of-publications which might work. A socket connection is created which might not work behind firewalls or on internet connections with high latency.
version 0.0.8 of snapquery is ready. It has e.g. http://snapquery.bitplan.com/api/meta_query/params_stats.github
SELECT count(*),
params
FROM "QueryDetails"
GROUP BY params
ORDER BY 1 desc
count(*) | params |
---|---|
374 | |
293 | q |
14 | q1,q2 |
9 | q,q |
3 | q,q,q |
3 | p |
1 | q,q2 |
1 | q,q,q,q,q |
1 | q,doi,q,doi,q,doi,q,doi,q,doi |
1 | lexeme |
Is your feature request related to a problem? Please describe. blazegraph is getting close to the 4TB limit. Wikimedia foundation is testing a graph split in Q1/2024. This will eventually and likeley force the use of:
also there is the already limiting timeout of 1 min of the official WDQS
Describe the solution you'd like
Describe alternatives you've considered Get your own copy of wikidata and use it see CEUR-WS Vol-3262 paper Getting and hosting your own copy of Wikidata
Additional context
Search Platfrom Office Hours 2023-12-06
Named Query handling:
Queries may be referenced theses days with e.g. short urls which are boths supported by the Wikdata Query Service and QLever. Personally i think it would be good to go one step futher and have "named queries". See e.g. https://cr.bitplan.com/index.php/List_of_Queries as a example for queries. Scholia also uses a similar idea internally. See https://github.com/WDscholia/scholia/tree/master/scholia/app/templates. Quite a few of these queries have no only a few parameters. E.g. https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/author_topics.sparql only takes a single q - identifier has input.
In my own pylodstorage project https://pypi.org/project/pyLodStorage/ i am already offering named queries but without parameters. https://github.com/WolfgangFahl/pyLoDStorage/issues/113 is the issue to parameterize the queries. The queries are described in Yaml files in this solution. I imagine a RESTFul service that takes a query name and a set of parameters and returns the result in a SPARQL server compatible way. This would mean that the details of the Query (e.g. whether it is federated or on which endpoint it runs) are hidden. I believe that this approach would work well with the intended Wikidata Split attempt in QI / 2024.
Links:
Previous analysis of blazegraph alternatives:
Qlever federation
Scaling Wikidata Query Service - Split the Graph experiment