arangodb / arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
https://www.arangodb.com
Other
13.45k stars 833 forks source link

Geo-spatial index not used with GEO_CONTAINS when the indexed field is passed as first argument #11353

Open mtzonev opened 4 years ago

mtzonev commented 4 years ago

My Environment

Component, Query & Data

Affected feature:

Geo-spatial index of geoJson type.

AQL query:

FOR timezone IN timezones
FILTER GEO_CONTAINS(timezone.geojson.geometry, GEO_POINT(2.3367157, 48.8673073))
RETURN timezone

AQL explain:

Query String (127 chars, cacheable: true):
 FOR timezone IN timezones
   FILTER GEO_CONTAINS(timezone.geojson.geometry, GEO_POINT(2.3367157,48.8673073))
   RETURN timezone
Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      1     - FOR timezone IN timezones   /* full collection scan */   FILTER GEO_CONTAINS(timezone.`geojson`.`geometry`, { "type" : "Point", "coordinates" : [ 2.3367157, 48.8673073 ] })   /* early pruning */
  5   ReturnNode                   1       - RETURN timezone

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-filters-into-enumerate

Dataset:

Single collection in the database. For the sake of testing the query above a single vertex is enough:

https://gist.github.com/mtzonev/4bfc808684280595ed399c29b381c29a

Size of your Dataset on disk:

The full dump of the database is around 40MB.

Steps to reproduce

  1. Create collection called timezones
  2. Insert the vertex from the gist
  3. Create a geo-spatial index collection.ensureIndex({ type: "geo", fields: [ "geojson.geometry" ], geoJson:true })
  4. Execute the query

Problem:

The index on the geometry field is not used when the field is supplied as the first argument of the GEO_CONTAINS function.

Expected result:

The geospatial index to be used regardless of position on which the indexed field is passed to the GEO functions. The function does work like that - the point is correctly placed within the Polygon/MultiPolygon from the field but the index is never made use of which results in a very slow query with the full collection.

NOTE: The index is only considered when the indexed field is supplied as second argument (meaning I can only check whether my stored data is contained in another arbitrary geometry object but I need it the other way around). I believe my use-case is perfectly reasonable and viable and I can't see a valid reason why it wouldn't work.

maxkernbach commented 4 years ago

Hi @mtzonev,

This issue is reproducible as you described.

The index is only considered when the indexed field is supplied as second argument

However, this is also the behavior that our documentation is currently stating:

The first parameter of GEO_CONTAINS must be a polygon. Other types are not valid. The second parameter must contain the document field on which the index was created.

I agree that your use case is sensible and we will keep this issue open as a feature request. I can't provide an ETA for this feature though.

mtzonev commented 4 years ago

@maxkernbach I see. Thank you for that. I guess I'll have to search for alternatives until this hopefully becomes supported natively.