Toblerity / zope.index.rtree

Defunct project
6 stars 2 forks source link

========================= zope.index.SpatialIndex

The package zope.index.SpatialIndex provides a 2d, 3d, n-d spatial index. It builds upon the python rtree package which has a fast underlying C library. Furthermore, this index integrates fully with the zodb, including transactions.

You can index bounding boxes and points currently.

.. contents::

Quick example

The index is compatible with zope.catalog and thus can be used like it.

This example shows how to index a few simple 2d objects.

Import boilerplate stuff ::

import persistent import transaction import zope.interface from anima.core.content.index.spatial.index import SpatialIndex import BTrees

Quickly setup a zodb ::

import ZODB.tests.util db = ZODB.tests.util.DB() dbconn = db.open() dbroot = dbconn.root()

Create a house. A house has a bounding box and thus is a spatial object which can be indexed ::

class House(persistent.Persistent): ... def init(self, name, boundingBox): ... persistent.Persistent.init( self ) ... self.name = name ... self.boundingBox = boundingBox

Finally create the index ::

settings = dict( dimension = 2, leaf_capacity = 20, pagesize = 4096, near_minimum_overlap_factor = 20, writethrough = False, buffering_capacity = 100, family = BTrees.family64 ) dbroot['index'] = index = SpatialIndex( field_name = 'boundingBox', interface = None, field_callable=False, settings = settings ) transaction.commit()

And add some sample content to the index, by default bounding box is like (minx, miny, maxx, maxy) ::

index.index_doc( 123, House('Mansion', (5,5,25,25)) ) index.documentCount() 1

Note that you can use the interleaved = False setting if you want to use bounding boxes like (minx, maxx, miny, maxy).

Now perform some queries on the index. Count all objects within the (0,0,100,100) bounding box ::

index.count( (0,0,100,100) ) 1L

Find all objects within the (0,0,100,100) bounding box ::

list( index.intersection( (0,0,100,100) ) ) [123L]

Find the single nearest object to (0,0) ::

list( index.nearest( (0,0), num_results = 1 ) ) [123L]

Use the apply() function ::

list( index.apply( 'intersection', (0,0,1,1) ) ) []

Get the bounds of the whole index ::

index.bounds [5.0, 5.0, 25.0, 25.0]

We're done, let's unindex the document for illustrative purposes ::

index.unindex_doc( 123 ) index.documentCount() 0

You can also clear the index ::

index.clear()

To make this really useful, use the IIntId util to map ids to actual objects. This is already done if you use the index within zope.catalog.

Old info from original announcement, some of the following info might be dated

Where to get it?

See attachment. It's still a very rough alpha version, don't use in
production. The attached test case works only in my own testing
environment, but you can see how the spatial index can be used.

What can it do?

You can insert your objects into the index along with a set of
coordinates. These coordinates can be 2d, 3d and even n-d. The
coordinates can be points or bounding boxes. The data is paged to the
database as needed, so you can create much bigger indices than would fit
in memory. Later you can then query the index for objects in a given area
or for the n-nearest objects to a certain point.

The underlying tree of the index is quite flexible, for example there are
special variants of the tree which are "very well suited suited for mobile
systems where queries on moving objects are common" [1]. The individual
variants can also be customized extensively, e.g. you can change the leaf
size, split factor, buffering and many more parameters. That's useful to
minimize ConflictErrors.

For detailed information, please see http://pypi.python.org/pypi/Rtree ,
http://trac.gispython.org/spatialindex,
http://anandmu.googlepages.com/comparison3finalreport.pdf and the
spatialindex sources.

How does it work?

I've gone and extended the rtree c api a bit so it supports custom
storages. This means you need the latest svn version if you want to test.
I have windows binaries ready for those interested.

Then I wrote the code you see in spatialIndex.py. When you add an object
to a SpatialIndex it generates a new entry with a random 64-bit integer
and stores the mapping id->object into an LOBTree. The individual tree
leaves are paged and stored via a custom rtree storage as binary data in
an LOBTree which maps pageId->pageData.

A custom data manager makes sure the tree's buffers are flushed at
transaction boundaries (commit/abort/savepoint/rollback). This ensures the
spatial index is fully transactional.

All the heavy lifting is done by the rtree and spatialindex libraries.
They are very well written in C++, so performance for the queries
themselves is quite high. The slow part is fetching pages from ZODB (ZEO),
but since objects which are located close to each other are likely to end
up in the same leafs and pages, you take the "fetch from db" hit only for
the first access and subsequent accesses will be cached (unless you are
also writing a lot).

For concrete examples, please the tests.

Known issues?

cleanness: The code is littered with ugly log statements. I've had a few
problems with the data manager (see below).

ConflictErrors: If you have a tree with very few leaves (e.g. huge leave
size or small number of objects) conflict errors will occur rather often,
because the objects are saved to the same page. You can fine tune this by
adjusting the rtree parameters (e.g. leaf_capacity, index_capacity,
pagesize and near_minimum_overlap_factor).

data manager: I had to hook the transaction's join() method in a very ugly
way, see the other post I made today for more about this.

entry ids: Entry ids are 64-bit. For very large trees this might be
insufficient to prevent collisions. This is checked however and an error
is raised when this happens.

performance: In some quick tests I did the index performed very well for
my needs. You might want to write your own benchmark if you plan to really
hammer the index.

packaging: I don't plan to create a package for this as I don't see much
point in adding yet another package to the clutter of packages surrounding
zodb.

References:

[1] http://anandmu.googlepages.com/comparison3finalreport.pdf