Spreads / Spreads.LMDB

Low-level zero-overhead and the fastest LMDB .NET wrapper with some additional native methods useful for Spreads
http://docs.dataspreads.io/spreads/libs/lmdb/api/README.html
Mozilla Public License 2.0
80 stars 9 forks source link

Duplicate cursor logic #32

Closed jakoss closed 5 years ago

jakoss commented 5 years ago

I have this code:

var codeSegments = new List<CodeSegment>(<some_initial_data>);

foreach (var item in codeSegments)
{
    var codeSegment = item;
    storeDatabase.Put(transaction, trackId, codeSegment, TransactionPutOptions.None);
}

using (var cursor = storeDatabase.OpenReadOnlyCursor(transaction))
{
    var value = default(CodeSegment);
    if (cursor.TryGet(ref trackId, ref value, CursorGetOption.Set)
            && cursor.TryGet(ref trackId, ref value, CursorGetOption.FirstDuplicate))
    {
        var segments = new CodeSegment[cursor.Count()];
        var counter = 0;
        segments[counter] = value;
        while (cursor.TryGet(ref trackId, ref value, CursorGetOption.NextDuplicate))
        {
            counter++;
            segments[counter] = value;
        }
        var orderedSegments = segments.OrderBy(e => e.Time).ToArray();

        for (int i = 0; i < ordered.Length; i++)
        {
            Debug.Assert(ordered[i].Time == orderedSegments[i].Time);
            Debug.Assert(ordered[i].Code == orderedSegments[i].Code);
        }
    }
}

But somehow return results are always off with input (Asserts don't pass, all items are off.

CodeSegment struct looks like this:

[StructLayout(LayoutKind.Sequential, Size = 2 * sizeof(int))]
[BinarySerialization(2 * sizeof(int))]
public struct CodeSegment
{
    public CodeSegment(int code, int time)
    {
        Code = code;
        Time = time;
    }

    public int Code;
    public int Time;
}

Is my cursor logic bad?

buybackoff commented 5 years ago

maybe cursor.Count has a side effect (but unlikely).

What are you storeDatabase options? The default options will sort CodeSegment as byte string, starting with Code.

buybackoff commented 5 years ago

You need to make Time the first field and use DupSortPrefix = 32 to sort only by the Time part.

buybackoff commented 5 years ago

IntegerDuplicates option will probably also work, but also if Time field goes first on little-endian machines (which are probably the only ones relevant and you on on a such one). See https://github.com/Spreads/Spreads.LMDB/issues/19#issuecomment-466375346

jakoss commented 5 years ago

I used

DbFlags.Create
                | DbFlags.IntegerKey
                | DbFlags.DuplicatesSort
                | DbFlags.DuplicatesFixed

But ordering is not important for me (i sorted input and output just to do asserting). The issue is data correctness. I don't get out the data i feed the database.

I tried to do this like that:

var array = segment.ToArray().AsMemory();
                var bytes = MemoryMarshal.Cast<CodeSegment, byte>(array.Span);
                var key = BitConverter.GetBytes(trackId).AsMemory();
                using (key.Pin())
                using (array.Pin())
                {
                    var keyBuffer = new DirectBuffer(key.Span);
                    var valueBuffer = new DirectBuffer(bytes);
                    databaseHolder.StoreDatabase.Put(tx, ref keyBuffer, ref valueBuffer);
                }

and it's working fine, i get exact data i saved

buybackoff commented 5 years ago

So it's no longer an issue?

jakoss commented 5 years ago

Well, i managed to do my code the other way that works for me, but i think that this cursor i tried earlier should have worked. I'll try to create some project that can reproduce the issue in my spare time

buybackoff commented 5 years ago

I'll try to create some project that can reproduce the issue in my spare time

A PR with a failing test would be very helpful.

jakoss commented 5 years ago

Seems like that particular case is heavily bounded to production database (which weights more them 50 GB) and i cannot reproduce this outside of it. I will close for now and reopen if i can reproduce it in the future