chrsmithdemos / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Multiple keys to the same value #222

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'd like to store objects that can be queried via multiple indexes. For 
example, I'd like to store serialized blog posts which can be queried by id, 
time and category + time keys. When querying over time or category, I want to 
use an iterator and retrieve an array of posts.

Currently, I can do this in two ways:

1. Store the object against the 'id' key, and store the id's against the other 
keys. As the iterator only returns ids, I have to make separate calls to Get 
for each id. This is slow.

2. I can store copies of the object against each key. This wastes space and is 
probably slower due to the cache filling up faster.

A better solution for this use case would be for Put to accept an array of 
secondary keys in addition to the (primary) key and value. Alternately (or 
additionally), there could also be a AddKey method which adds a secondary key 
to an existing key-value pair.

Possible implementations:

1. In the log as well as the SST files, secondary keys can be written like 
regular keys except that instead of a value it would have a 'link' 
(filename+byte offset?) to the on-disk location of the primary key-value pair. 
Readers accessing the secondary key can follow these links in O(1) time.

2. Instead of filename and byte offset, the link can store the primary key.

Reads will be faster in the first implementation, but keeping links updated 
through updates, deletes, merges and compaction will be complex. The second 
implementation will be simpler at the cost of slower reads. Reading via 
secondary keys would be slower than through the primary keys as they require 
two disk reads, but it would be preferable to both the options available today.

Note: An AddKey method will enable clients to model One-Many and Many-Many 
relationships efficiently.

Original issue reported on code.google.com by arav...@scrollback.io on 4 Jan 2014 at 5:39

GoogleCodeExporter commented 9 years ago
Option 2 you propose will have approximately the same performance whether it is 
implemented inside or outside leveldb.  For simplicity, we have been leaving 
such things out of leveldb.  We leave it to higher-level layers/databases that 
people build on top leveldb (think of leveldb as a low-level storage engine 
mainly targeted towards building higher-level storage systems like indexeddb, 
riak. etc.).

One plausible extension that might be worth adding to level is a MultiGet 
(equivalent to N Get() calls).  That could speed up the N secondary key lookups.

But direct index support is not in the plans.

Original comment by san...@google.com on 6 Jan 2014 at 10:26