Wikidata / SQID

A tool to analyse, browse and query Wikidata
http://tools.wmflabs.org/sqid/
Apache License 2.0
84 stars 17 forks source link

Use DISTINCT in "Total" queries #31

Closed mkroetzsch closed 8 years ago

mkroetzsch commented 8 years ago

The query view issues path queries to navigate the class hierarchy, but forgets to use DISTINCT to eliminate duplicates, which are caused by multiple paths leading to the same thing.

arsylum commented 8 years ago

Added in 518a878

I wanted to look into local duplicate elimination because I had a case where adding DISTINCT would make an otherwise working query timeout. But actually I haven't been able to replicate this since then.

Question: Does DISTINCT impact performance when added to a query that wouldn't produce duplicates anyway? Or would it be a bad idea to just include DISTINCT by default for any other reasons?

mkroetzsch commented 8 years ago

Yes, DISTINCT has a big impact on performance even if there are no duplicates, since you don't know that there are no duplicates until you have done all the extra computation required for DISTINCT. In some cases, the query engine could be smart enough to see that there are no duplicates from the query, so as to drop the DISTINCT again, but it is always better if we omit it whenever possible.

arsylum commented 8 years ago

887562b90fec1442e2f71d35323ebeee3f6acdf4