drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
693 stars 90 forks source link

Unexpected slowness for some simple global counts #614

Closed tuukka closed 3 years ago

tuukka commented 3 years ago

I'm missing some documentation on how to reason about the slowness of some simple-looking queries that count entities globally.

Times out:

[out:json];
(
  node["amenity"="restaurant"];
);
out count;

Works (in 15 seconds):

[out:json];
(
  node["amenity"="restaurant"]["diet:vegan"="only"];
);
out count;

Takes around 180 seconds:

[date:"2012-09-12T06:55:00Z"]
[out:json];
(
  node["amenity"="restaurant"]["diet:vegan"="only"];
);
out count;
mmd-osm commented 3 years ago

It’s not realistic to ask for a documentation that answers such generic performance questions. There are simply far too many combinations, and runtimes keep changing from version to version.

As an example, your first query takes 6s on another branch with the following result:

"elements": [

{ "type": "count", "id": 0, "tags": { "nodes": "925905", "ways": "0", "relations": "0", "total": "925905" } }

tuukka commented 3 years ago

Thank you for the reply, and good to hear there is no inherent limitation why my queries happened to be slow! I also understand that it may not be feasible to document a living system to such extent. Could the solution be something similar to how SQL databases can show the query plan when requested to EXPLAIN a given query?

mmd-osm commented 3 years ago

I see very little benefit to provide such information for users of a public facing API, as you wouldn't have any way to influence the processing for such very simple queries anyway, Overpass also doesn't have a query planner.

What kind of problem are you trying to solve in the first place? Maybe Overpass is the wrong tool for you.

drolbr commented 3 years ago

For a little bit of background: the query is in no way simple for the database. The database reads all restaurants from all over the world for this purpose from the disk. The command out count exists to save on network bandwidth, the previous solution had been to use out ids and to count lines. Usually, this kind of questions is well answered by Taginfo. Thus, it is unlikely that this kind of task will ever win a trade-off against other features.