NJLangley / csharptest-net

Automatically exported from code.google.com/p/csharptest-net
0 stars 0 forks source link

BPlus tree possible corruption #20

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Run the attached file

When the stream reads the data is the wrong length so de-serialization fails.  
This is using protobuf-net 2.0.0.470

Original issue reported on code.google.com by benson.m...@gmail.com on 20 Jan 2012 at 5:13

Attachments:

GoogleCodeExporter commented 8 years ago
Sorry this took so long for me to find, I was expecting an email if someone 
posted an issue.  I guess I missed it, anyway, there are a couple of issues 
here that will be a good example of how to use the BPlusTree:

1. Linq uses on the BPlusTree (or Keys/Values collections) are generally a bad 
idea as you have little control of the underlying implementation.  For simple 
tests and small data it makes sense; however, when that data scales to millions 
it starts to become an issue.  Additionally some Linq iterations expect to know 
the count of items, so for example  BPlusTree.Keys.ToArray() will throw unless 
you call BPlusTree.EnableCount().  Even after calling this the concurrent 
nature of the tree must be taken into consideration.  Linq was written with the 
standard .NET collections in mind, therefore they don't expect that a Count can 
change during the enumeration and copy.  Still, as a simple test this works 
well enough.  You should be ok with some of the more simple Linq methods like 
Where(), First(), etc.  Of course this is not specific to Linq only, there are 
a number of cases where the .NET collections will obtain a Count prior to 
iteration.  This is very problematic since the BPlusTree can iterate over a 
mutating list.  If your use of the BPLusTree is strictly single-threaded, then 
just make sure to call EnableCount() after opening if you need to work with the 
collection's Count.  

2. The other issue here is more directly related to your inquiry.  You stated 
that you were using 'protobuf-net' for serialization.  You sample code is 
actually using the BinaryFormatter from .NET and works fine.  However, if you 
replace this formatter with one from protobuf-net it no longer works.  Why? 
simply put the serializers behave very differently.  The BinaryFormatter will 
length-prefix objects serialized and stop reading from the stream when it has 
read the object.  This is required for the BPlusTree's implementations of the 
ISerializer<T> interface.  The protobuf-net serializer on the other hand will 
continue reading the stream until it reaches the end of the input.  Using 
protocol buffers, this is often solved by writing a delimited message, meaning 
that the message data is length-prefixed.  I don't know how to accomplish this 
via protobuf-net; however, the other option is to first turn the object into a 
byte array and then serialize the array.  Here is a working example on 
protobuf-net for you:

{{{
    public class TestSerializer : ISerializer<Int32[]>
    {
        private readonly IFormatter _formatter = ProtoBuf.Serializer.CreateFormatter<int[]>();

        public Int32[] ReadFrom(System.IO.Stream stream)
        {
            using (var ms = new MemoryStream(PrimitiveSerializer.Bytes.ReadFrom(stream), false))
                return (Int32[])_formatter.Deserialize(ms);
        }

        public void WriteTo(Int32[] value, System.IO.Stream stream)
        {
            int len;
            byte[] bytes;
            using (MemoryStream ms = new MemoryStream())
            {
                _formatter.Serialize(ms, value);
                bytes = ms.ToArray();
            }
            PrimitiveSerializer.Bytes.WriteTo(bytes, stream);
        }
    }
}}}

This brings us to the next topic, PrimitiveSerializer.  The PrimitiveSerializer 
class declared in the namespace CSharpTest.Net.Serialization implements many of 
the serializers you will need for primitives.  For example the GuidSerializer 
you defined can be replaced entirely by the PrimitiveSerializer.Guid instance.  
Using the PrimitiveSerializer a more efficient implementation of the preceding 
serializer would be the following:

{{{
    public class Int32ArraySerializer : ISerializer<Int32[]>
    {
        public int[] ReadFrom(Stream stream)
        {
            int size = PrimitiveSerializer.Int32.ReadFrom(stream);
            if (size < 0)
                throw new System.IO.InvalidDataException("Length can not be less than 0.");
            int[] value = new int[size];
            for (int i = 0; i < size; i++)
                value[i] = PrimitiveSerializer.Int32.ReadFrom(stream);
            return value;
        }

        public void WriteTo(int[] value, Stream stream)
        {
            if (value == null)
                throw new ArgumentNullException("value");
            PrimitiveSerializer.Int32.WriteTo(value.Length, stream);
            foreach (int i in value)
                PrimitiveSerializer.Int32.WriteTo(i, stream);
        }
    }
}}}

If you need the Int32 values packed with the variant encoding used by protocol 
buffers, you can simply use the VariantNumberSerializer defined in the same 
namespace.  This class will read and write numbers in the same way protocol 
buffers serializes numbers.  Thus if space is an issue, replace the uses of 
PrimitiveSerializer above with VariantNumberSerializer for a more compact 
storage format.

Original comment by Grig...@gmail.com on 22 Aug 2012 at 5:57