Spreads / Spreads.LMDB

Low-level zero-overhead and the fastest LMDB .NET wrapper with some additional native methods useful for Spreads
http://docs.dataspreads.io/spreads/libs/lmdb/api/README.html
Mozilla Public License 2.0
80 stars 9 forks source link

Usage of CursorPutOptions.MultipleData #5

Closed jakoss closed 6 years ago

jakoss commented 6 years ago

I cannot find test that shows usage of flag CursorPutOptions.MultipleData. I looked in your code and it seems like you do map native method that get DirectBuffer[] array, but i cannot find method in Cursor that actually uses it.

Can you provide some help with this one?

PS. Sorry with all those question today, i just find your code very promising :)

@EDIT: same thing seems to be with getting multiple data

buybackoff commented 6 years ago

It's not implemented currently in a special way, e.g. generics or something. You have to work with raw MDB_WAL directly, which is represented by DirectBuffer. It has property Span<byte> that you could cast to Span<T> with MemoryMarshal:

var spanInt = MemoryMarshal.Cast<byte, int>(directBuffer.Span);
buybackoff commented 6 years ago

Multiple values are always dupfixed, so you could create a blittable struct representing whatever you store in dupfixed and it could be read from Span<T> very efficiently.

Actually I have re-read the docs and it's more complicated:

MDB_MULTIPLE - store multiple contiguous data elements in a single request. This flag may only be specified if the database was opened with MDB_DUPFIXED. The data argument must be an array of two MDB_vals. The mv_size of the first MDB_val must be the size of a single data element. The mv_data of the first MDB_val must point to the beginning of the array of contiguous data elements. The mv_size of the second MDB_val must be the count of the number of data elements to store. On return this field will be set to the count of the number of elements actually written. The mv_data of the second MDB_val is unused.

It's doable now in unsafe way, you could construct DirectBuffer pointing to two DirectBuffers as written in the docs. But it may be somewhat complicated.

Will look at it later to create a helper method. We do not use multiple values currently, so it was not implemented yet as well as nested transactions.

buybackoff commented 6 years ago

A method will accept Span<T> to write multiple T values. It will return count as in docs, and then you will need to just span.Slice(writtenCount) and repeat.

An array of two MDB_vals could be a struct:

struct MultiValue {
DirectBuffer db1;
IntPtr count
... properties to set size, pointer to span, data
}

and then we pass it instead of DirectBuffer to the cursor method. This needs some work to be convenient.

jakoss commented 6 years ago

Ok, that makes sense. Reading is far more important than writing for me, so as far as MemoryMarshal.Cast works - it's good for me.

jakoss commented 6 years ago

I tried to simulate multiple read for now, using simple iterating. But this code

using (var cursor = tableDatabase.OpenCursor(tx))
                {
                    if (cursor.TryGet(ref key, ref value, CursorGetOption.FirstDuplicate))
                    {
                        var counter = 0;
                        buffer = new ulong[cursor.Count()];
                        buffer[counter] = value.ReadUInt64(0);
                        value = default;

                        while (cursor.TryGet(ref key, ref value, CursorGetOption.NextDuplicate))
                        {
                            counter++;
                            buffer[counter] = value.ReadUInt64(0);
                            value = default;
                        }

                        if (counter != (buffer.Length - 1))
                        {
                            throw new Exception("Bad buffer length");
                        }
                    }
                }

is crashing on cursor.TryGet(ref key, ref value, CursorGetOption.FirstDuplicate) with error

Spreads.LMDB.LMDBException: 'Invalid argument'

Am i using it wrong?

buybackoff commented 6 years ago

value = default; - how do you search next duplicate from default value in the while loop?

jakoss commented 6 years ago

Ok, i had to call TryGet with CursorGetOption.First first and then with FirstDuplicate. Cursors in LMDB are kind of confusing..

This works:

using (var cursor = tableDatabase.OpenCursor(tx))
                {
                    if (cursor.TryGet(ref key, ref value, CursorGetOption.First) && cursor.TryGet(ref key, ref value, CursorGetOption.FirstDuplicate))
                    {
                        var counter = 0;
                        buffer = new ulong[cursor.Count()];
                        buffer[counter] = value.ReadUInt64(0);

                        while (cursor.TryGet(ref key, ref value, CursorGetOption.NextDuplicate))
                        {
                            counter++;
                            buffer[counter] = value.ReadUInt64(0);
                            value = default;
                        }

                        if (counter != (buffer.Length - 1))
                        {
                            throw new Exception("Bad buffer length");
                        }
                    }
                }
buybackoff commented 6 years ago

yes, cursors are stateful and for some options require correct previous position. this is not documented well enough.