ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
406 stars 52 forks source link

Document the workflow for customizing a local QLever instance based on Wikidata dump #1569

Open Daniel-Mietchen opened 2 weeks ago

Daniel-Mietchen commented 2 weeks ago

At a hackathon this weekend, @WolfgangFahl has set up a local QLever instance based on the current Wikidata dump. It works well for our main purpose (testing Wikidata-related queries) but we have not yet figured out how to customize our instance.

Landing page Freiburg: Screenshot from 2024-10-20 17-08-57

Landing page Aachen: Screenshot from 2024-10-20 17-09-20

No major differences except for the Format/ Reset button, and we would like to customize the footer.

Index Information Freiburg: Screenshot from 2024-10-20 17-10-42

Index Information Aachen: Screenshot from 2024-10-20 17-11-33

Here, we are clearly lacking information but haven't figured out how to get the relevant information to display there.

Backend Information Freiburg: Screenshot from 2024-10-20 17-12-50

Backend Information Aachen: Screenshot from 2024-10-20 17-13-13

Again, not much of a difference there, except for the ask command.

Thanks for any pointers.

hannahbast commented 2 weeks ago

@Daniel-Mietchen and @WolfgangFahl: Thanks for the feedback. Some comments/questions:

  1. Did you use the the qlever CLI aka qlever script with the pre-configured QLeverfile for Wikidata? if yes, everything should have just worked out of the box. Please let us know if you encountered any problems. If you did, I would like to find out whether you did more work than necessary (people sometimes do that) or if there are some bugs on our end.

  2. The functionality for the "Format" button is in https://github.com/ad-freiburg/qlever-ui/pull/103, which is not merged yet because of some nitpicks. If it is important for you, you can just merge it yourself for your instance. We will also merge it soon.

  3. Exactly what is it that you want to customize? Are you aware that there is a configuration for each backend, which you can customize by clicking on "Backend information" and then "Edit this backend"? The layout of the page is currently not customizable, but it is very easy to modify the template files, which are in https://github.com/ad-freiburg/qlever-ui/tree/master/backend/templates

  4. You can set the description for the index and the text index in the Qleverfile. The respective variable names are DESCRIPTION and TEXT_DESCRIPTION.

  5. ASK queries are not yet implemented, but will be very soon. There is already a PR that works: https://github.com/ad-freiburg/qlever/pull/1562

WolfgangFahl commented 1 week ago

@hannahbast Thanks for the reponse. We indeed created a qlv script as a wrapper to make the qlever script available in background since it uses a tty and we can not use nohup for it but have to run it in a screen session. That makes error handling harder. Please note that the machine will have a running ui docker process while it runs since we intend to rotate between freshly indexed versions as often as possible which is e.g. currently weekly. So the next test will probably start tomorrow on our alpha disk while the oct 16th dump was on delta.

Did you use the the qlever CLI aka qlever script with the pre-configured QLeverfile for Wikidata? if yes, everything should have just worked out of the box.

See https://github.com/ad-freiburg/qlever-control/issues/80 why it does not. Even on a successful index we'll not automatically get a new running instance in the style of the current qlv script https://wiki.bitplan.com/index.php/Wikidata_Import_2024-10-17#Using_qlv_script

the qlv script is allowing to rotate disks but also works around the tty problem that we can not run the qlever control script in background but have to work around with a screen environment which is much harder to control and debug.

_You can set the description for the index and the text index in the Qleverfile. The respective variable names are DESCRIPTION and TEXTDESCRIPTION.

Having the information in the log outputs would already help IMHO the information about the source time/stamp size of download and number of triples is worthwhile to keep around any way. I still e.g. have a hard time to compare triple counts in https://wiki.bitplan.com/index.php/List_of_Imports