ConnectedPlacesCatapult / TomboloDigitalConnector

The Tombolo Digital Connector enables users to combine different sources of data in a transparent and reproducible way.
MIT License
58 stars 29 forks source link

Indexing on table with Geometry #559

Open arya-hemanshu opened 6 years ago

arya-hemanshu commented 6 years ago

Description

In subject table the column shape is not indexed, indexing the shape column would significantly improve the performance of the tool.

Have gone through the documentation of creating an index on spatial column and have tried creating it, however found no performance improvement.

Upon digging a little more in the documentation found out that to spatially index a table the table should only store a single type of geometry e.g polygons and not any other geometry. There should be different tables for storing different type of geometries, e.g points, lines and polygons, then only indexes can be effective.

However, currently DC stores all type of geometries in a single subject table. It requires reviewing and think of a different approach if required.

Error log

None

Anafi commented 6 years ago

What if you convert every geometry to a geometry collection consisting of a single geometry (ST_Collect)? Does is indexing work with geometry collections?

arya-hemanshu commented 6 years ago

@Anafi thank you for the suggestion, i will have a look at it, however i think spatial indexes only work with polygons, points and lines as while indexing postgis creates a bounding box around every geography present in the table. I will update this issue with my findings.

Anafi commented 6 years ago

@arya-hemanshu did the spatial index on Geometry Collection type work?

arya-hemanshu commented 6 years ago

@Anafi i am not sure, as the official documentation of postgis indexing, doesn't have any examples of indexing on GeometryCollection and moreover i couldn't find any mention in the documentation that it works on GeometryCollection. Only, multipolygons, polygons, line and point examples are available in their official documentation

Anafi commented 6 years ago

When testing spatial intersection between 1,000 features of a table, having a spatial index did not show any major speed performance (1.4 sec. without spatial index, 1.3 sec with spatial index). However when tested on a table of 1,000,000 features it does make a difference (see below).

-- without spatial index
-- (it did not finish after 15 min..)
SELECT a.geom, b.geom
FROM  p0000.mytest AS a,  p0000.mytest AS b 
WHERE ST_Intersects(ST_CollectionHomogenize(a.geom), ST_CollectionHomogenize(b.geom)) AND a.id<>b.id;

CREATE INDEX mytest_gix ON p0000.mytest USING GIST (geom);

-- with spatial index operator 
-- 7.5 sec 
SELECT a.geom, b.geom
FROM  p0000.mytest AS a,  p0000.mytest AS b 
WHERE ST_CollectionHomogenize(a.geom)&&ST_CollectionHomogenize(b.geom) AND ST_Intersects(ST_CollectionHomogenize(a.geom), ST_CollectionHomogenize(b.geom)) AND a.id<>b.id;
arya-hemanshu commented 6 years ago

@Anafi I will put an index in the code base, just had one query. These queries looks different to me, are the same? or one produces a different result then other?

Anafi commented 6 years ago

@arya-hemanshu The queries are different indeed. The difference is that the second one is not only using the ST_Intersects but also the && operator. That operator is used to first filter the features that their bounding boxes intersect and then ST_Intersects filters the features that their actual geometries intersect. See here

arya-hemanshu commented 6 years ago

@Anafi is it possible to use the same queries with and without the spatial index and then compare the results?

Anafi commented 6 years ago

@arya-hemanshu running the second query on the table without the index takes as long as the first query (more than 15 min.) as the && operator has not any effect without the spatial index.