Open ccollie opened 1 month ago
So you would be using a TreeMap<Timestamp, Chunk>
as your index structure? And you mentioned that the key would be chunk.start
?
I think it would possibly work if you did chunks.range(..=new_sample.timestamp).rev()
. The ..=new_sample.timestamp
will give you an iterator of chunks where the chunk.start <= new_sample.timestamp
. The .rev()
is so that the first element in the iterator will be the most likely match.
One underlying assumption I have is that the byte representation of Timestamp
is ordered in the same way that Timestamp
is
Somewhat off-topic questions:
For this i need to efficiently locate a chunk into which to insert the sample.
When you insert a new sample into a chunk, do you update the start
and end
bounds of the chunk? Or is it the case that you'd instead insert a new chunk if the new_sample.timestamp
is outside the bounds of any existing chunk?
So you would be using a TreeMap<Timestamp, Chunk> as your index structure? And you mentioned that the key would be chunk.start?
Yes.Timestamp
is essentially an alias for i64
When you insert a new sample into a chunk, do you update the start and end bounds of the chunk? Or is it the case that you'd instead insert a new chunk if the new_sample.timestamp is outside the bounds of any existing chunk?
In this case I'm specifically looking at upserts. Imagine a TimeSeries
struct that tracks its range..
pub struct TimeSeries {
pub start: Timestamp,
pub end: Timestamp,
pub chunks: TreeMap<Timestamp, Chunk>
}
Checking an incoming Sample
against the range we can determine if we need to upsert or add a new chunk. It is of course possible that a chunk may have to be re-indexed (we also want to handle the case when chunks are imbalanced w.r.t the number of samples)
Yes.Timestamp is essentially an alias for i64
One thing to note if you're using an i64
(or any unsigned integer) as a key, is that you'll need to apply a mapping so that the ordering of the byte representation matches the ordering of the type. https://docs.rs/blart/latest/blart/struct.ToIBE.html#impl-BytesMapping%3Ci32%3E-for-ToIBE is the impl you're looking for, and you would combine it with the Mapped<_, _>
struct like so:
use blart::{TreeMap, Mapped, ToIBE, BytesMapping};
// assumption
type Timestamp = i64;
type Map = TreeMap<Mapped<ToIBE, Timestamp>, Chunk>
This does make keys a little more difficult to use sadly, but not impossible. Let me know about any difficulties in this part or any proposals to make this translation more transparent. Its definitely a rough area of the crate
I have another use-case i'd like to use
blart
for. Assume I have a series of chunks bounded by timestamp and containing samples.I store multiple chunks per timeseries, and later I need to upsert a
Sample
in a chunk. For this i need to efficiently locate a chunk into which to insert the sample.It seems to me that
chunks.range(new_sample.timestamp..)
won't do the trick, but I can't see any means short of iteration from the start.Edit: chunks would be indexed by
chunk.start