ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
376 stars 45 forks source link

Fix missing `geo:wktLiteral` datatype #1521

Closed joka921 closed 1 day ago

joka921 commented 1 day ago

After the previous commit, all WKT literals except POINTs were missing the geo:wktLiteral datatype. This is now fixed again.

codecov[bot] commented 1 day ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.12%. Comparing base (85793e3) to head (b174038). Report is 1 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1521 +/- ## ========================================== - Coverage 88.12% 88.12% -0.01% ========================================== Files 357 357 Lines 26764 26765 +1 Branches 3606 3606 ========================================== - Hits 23587 23586 -1 Misses 1941 1941 - Partials 1236 1238 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sonarcloud[bot] commented 1 day ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

hannahbast commented 1 day ago

@ullingerc Even after this fix, there is still an inconsistency. Namely, in the TSV export, the points appear as POINT(...) without quotes and without datatype, whereas all other geometry types appear in quotes and with datatype. The TSV format does not require a particular format, but we should of course make it consistent. Do you know how to fix this?

ullingerc commented 1 day ago

This may be caused by this review: https://github.com/ad-freiburg/qlever/pull/1506#discussion_r1770882688 I then removed the full string representation with quotes and datatype. Tomorrow I will look into it and check if this is actually the cause.

ullingerc commented 14 hours ago

@hannahbast I have checked it and the bug has nothing to do specifically with my implementation in #1506 . It also affects the date type. See for example: https://github.com/ad-freiburg/qlever/blob/f39907ccfe28096199c817b289e5c9131cd27cc3/test/ExportQueryExecutionTreesTest.cpp#L500-L504

hannahbast commented 5 hours ago

@ullingerc Thanks for the info, than this is work for a separate PR. But I just hit another problem, when trying to build the index for the latest version of Wikidata (which never had problems with coordinates in the past):

2024-09-28 15:26:12.867 - ERROR: Parse error at byte position 116541359260: /local/data-ssd/qlever/qlever-code/src/parser/GeoPoint.cpp, line 19: The given value 100 is out of range for latitude coordinates.

Any idea, what the problem is here? Anyway, it would be good to output the whole POINT... string here.

hannahbast commented 5 hours ago

@ullingerc PS: Shortly before the parse fails, I find the following in the dataset: "Point(0 99.999999)"^^geo:wktLiteral. I understand that a latitude should not be larger than 90. But it would be good to be more lenient here and either omit the triple or not "fold" it but leave it as an ordinary literal (maybe without datatype). We already have warning messages for that at other places, for example:

2024-09-20 22:17:08.122 - WARN: IRI ref not standard-compliant: <http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/C4B 16-02>

ullingerc commented 5 hours ago

@hannahbast It's very helpful that you found the problematic item, thanks. I think we should keep the verification for coordinates, otherwise the folding will no longer work in a meaningful way. However, I agree, the error should be handled more gracefully, similar to the example you have pointed out.