TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.55k stars 183 forks source link

How to read arrangements from disk and cache values inside Cursor / Storage? #520

Open oli-w opened 1 week ago

oli-w commented 1 week ago

I'm trying to implement disk-backed arrangements, where updates are stored as immutable batches on disk and chunks of the batch are loaded into memory as you seek around the cursor.

I have set up:

The problem I have run into is trying to add the caching layer - ideally an LRU cache of chunks. The key method of Cursor is this:

/// A reference to the current key. Asserts if invalid.
fn key<'a>(&self, storage: &'a Self::Storage) -> Self::Key<'a>;

Let's say for example that the Key<'a> associated type is hard-coded to &'a String. The lifetime on &'a Self::Storage and the return type means that the "cache" must belong to the storage. Since there isn't a mutable reference to storage, I believe the only option is to use a RefCell (maybe around a HashMap), but I can't return a &'a reference to content inside a RefCell - RefCell::borrow() returns a Ref with a lifetime that only lasts until the end of the current block.

I tried changing Key<'a> to an owned String or Ref<String>, but this is not possible because these don't implement Copy.

Ideas:

Please let me know if I've missed some other way to achieve this. This worked really nicely for Copy types like u32, but I've had no luck with other types.

frankmcsherry commented 6 days ago

Hello!

It's a good question, and I'm not certain there will be satisfying answers. I think the two ideas you've proposed run in to other constraints, so let me talk those through.

First, the 'a lifetime exists to allow references in to storage to outlive the cursor itself, which will continually change as you march around in it. The aim is to allow one to pull down many keys, vals, times, diffs, etc, or references to them, even while the cursor is changing. If you allow the 'a to depend on the cursors &mut self lifetime, this all goes away (you can't re-borrow the cursor to mutate it as long as you hold a reference). I think this also confounds the approach of using &'a mut for storage references, as the mutable borrow also wouldn't be available again as long as the key reference exists.

Minor, but I think the problem is probably not the key function (which just reads) and can probably be localized into seek_key and step_key, which are the moments that one changes which key the cursor is looking at.

But the core problem that I see is that if you want an LRU cache, or something like this, .. it will invalidate some data in response to requests for data it needs to bring in. How does this "lifetime" get communicated to Rust? I think the problem is that it is not a lexical lifetime, which means that Rust is not obviously well equipped to express it. Hypothetically, if you have a one element cache, then every new access is going to invalidate previous elements, and none of the DD code will work anyhow.

An alternate approach, making this up ad lib, is that the Key<'_> type should behave more like an Rc<_>. It represents the ability to acquire the data, rather than the data itself. When you try and access it, that might cause some cache replacement work to happen behind the scenes. But as long as you hold it you are certain to be able to access through it.

This is a bit like what we've done a few times with DD, which is "let virtual memory handle it". If you treat it as just .. data that Rust has, potentially lots more than you have physical memory, there are a few controls that let you respond to "ack; I don't have the data mapped in". Virtual memory is one, and I think @antiguru may be able to speak to how lg_alloc ties in and acts as a more intentional paging layer.

I'll keep pondering, but it does seem tricky to both 1. maintain "references" that live longer than the borrow of the cursor, and 2. adapt resources in response to the cursor position. If you see other idioms out there where folks have succeeded with this, lmk!

frankmcsherry commented 6 days ago

An alternate approach, which is a pretty substantial (but not unmotivated) re-design, would be to have DD expose the ability to chunk data more explicitly. Right now you can partition data between workers, which is useful. But perhaps each worker would further like to chunk their data down to smaller key ranges, say. This would allow them to "page in" the data for a chunk of keys, do that work without modifying the backing store, and then release those resources as it moves on to other chunks of keys.

It doesn't address the question of what if the data for one key is too large to fit, but I guess that will always exist (as long as one is allowed to say "I only have 1KB; what now").

oli-w commented 6 days ago

Ahah! Thank you, this all makes sense.

An alternate approach, making this up ad lib, is that the Key<'_> type should behave more like an Rc<_>. It represents the ability to acquire the data, rather than the data itself.

This works for me 😃. I store data on files and can promise that it can be accessed at any time during program execution. I was originally thrown by the Copy constraint, but realise that I can create a KeyRef<T> which just stores a few Copy-able properties: a file index, range index and key index. I implemented an LruCache and file metadata with a few thread-local statics. This makes joining with these arrangements a little bit trickier since the other arrangement also needs to have the same KeyRef type, but I've got some ideas to play around with.

Keen to hear more about lg_alloc integration and experience using mmap. I read some of the code in Materialize for hooking up lg_alloc - it does seem really neat to just pretend like you've got tons of memory and let the OS page to disk.

frankmcsherry commented 5 days ago

Ah, the Copy constraint may be negotiable. I think it emerged from the generalization from &Key to Key<'_> and the thought was "this will keep us on track" more than "this is a rule!". I suspect it could be relaxed to Clone without much concern, though with the understanding that this might happen not uncommonly as buffer of these things get slung around.