calrissian / accumulo-recipes

Recipes & cookbooks for Accumulo.
http://www.calrissian.org
Apache License 2.0
37 stars 18 forks source link

[Geospatial Store] Upgrade to use the QfdHelper #114

Open cjnolet opened 10 years ago

cjnolet commented 10 years ago

Also with the upgrade should probably include the notion of a GeoEvent or GeoEntity. In the case of a GeoEvent, it's important that time become a dimension. It wouldn't be hard to qualify the dimension as a shard (YYYYmmDDhh_bbox_partition) but realistically, the global index is what's important here. If the global index can be scanned appropriately then reduction of the dataset can occur before the shards are scanned.

eawagner commented 10 years ago

+1. This would be a good case to show how the QFDstore could be used to customize the behavior of the QFD store to different types of data.

cjnolet commented 10 years ago

Edit the shard description above- it would actually be more like this:

partition_YYYYmmDDhh_bbox so that, for each partition for each time, a range of bounding boxes can be queried. One problem I see with this right away is the way that the time-based shards are currently stored in the event-based global index:

k_key alias shard0 k_key alias shard1 v_alias_value key shard0 v_alias_value key shard1

This structure allows for the scanning of a range of shards for a specific value to find out which shards need to be added to the ranges for the shard table for intersections/unions. The problem is that the column qualifiers could easily slow down the scan through the index table. We'll need to figure that one out- maybe add them to the family instead so they can be in the rfile index to be seeked?

Ultimately, this pattern would still translate to the geostore global index fairly well.