Closed GoogleCodeExporter closed 8 years ago
Sorry this took so long for me to find, I was expecting an email if someone
posted an issue. I guess I missed it, anyway, there are a couple of issues
here that will be a good example of how to use the BPlusTree:
1. Linq uses on the BPlusTree (or Keys/Values collections) are generally a bad
idea as you have little control of the underlying implementation. For simple
tests and small data it makes sense; however, when that data scales to millions
it starts to become an issue. Additionally some Linq iterations expect to know
the count of items, so for example BPlusTree.Keys.ToArray() will throw unless
you call BPlusTree.EnableCount(). Even after calling this the concurrent
nature of the tree must be taken into consideration. Linq was written with the
standard .NET collections in mind, therefore they don't expect that a Count can
change during the enumeration and copy. Still, as a simple test this works
well enough. You should be ok with some of the more simple Linq methods like
Where(), First(), etc. Of course this is not specific to Linq only, there are
a number of cases where the .NET collections will obtain a Count prior to
iteration. This is very problematic since the BPlusTree can iterate over a
mutating list. If your use of the BPLusTree is strictly single-threaded, then
just make sure to call EnableCount() after opening if you need to work with the
collection's Count.
2. The other issue here is more directly related to your inquiry. You stated
that you were using 'protobuf-net' for serialization. You sample code is
actually using the BinaryFormatter from .NET and works fine. However, if you
replace this formatter with one from protobuf-net it no longer works. Why?
simply put the serializers behave very differently. The BinaryFormatter will
length-prefix objects serialized and stop reading from the stream when it has
read the object. This is required for the BPlusTree's implementations of the
ISerializer<T> interface. The protobuf-net serializer on the other hand will
continue reading the stream until it reaches the end of the input. Using
protocol buffers, this is often solved by writing a delimited message, meaning
that the message data is length-prefixed. I don't know how to accomplish this
via protobuf-net; however, the other option is to first turn the object into a
byte array and then serialize the array. Here is a working example on
protobuf-net for you:
{{{
public class TestSerializer : ISerializer<Int32[]>
{
private readonly IFormatter _formatter = ProtoBuf.Serializer.CreateFormatter<int[]>();
public Int32[] ReadFrom(System.IO.Stream stream)
{
using (var ms = new MemoryStream(PrimitiveSerializer.Bytes.ReadFrom(stream), false))
return (Int32[])_formatter.Deserialize(ms);
}
public void WriteTo(Int32[] value, System.IO.Stream stream)
{
int len;
byte[] bytes;
using (MemoryStream ms = new MemoryStream())
{
_formatter.Serialize(ms, value);
bytes = ms.ToArray();
}
PrimitiveSerializer.Bytes.WriteTo(bytes, stream);
}
}
}}}
This brings us to the next topic, PrimitiveSerializer. The PrimitiveSerializer
class declared in the namespace CSharpTest.Net.Serialization implements many of
the serializers you will need for primitives. For example the GuidSerializer
you defined can be replaced entirely by the PrimitiveSerializer.Guid instance.
Using the PrimitiveSerializer a more efficient implementation of the preceding
serializer would be the following:
{{{
public class Int32ArraySerializer : ISerializer<Int32[]>
{
public int[] ReadFrom(Stream stream)
{
int size = PrimitiveSerializer.Int32.ReadFrom(stream);
if (size < 0)
throw new System.IO.InvalidDataException("Length can not be less than 0.");
int[] value = new int[size];
for (int i = 0; i < size; i++)
value[i] = PrimitiveSerializer.Int32.ReadFrom(stream);
return value;
}
public void WriteTo(int[] value, Stream stream)
{
if (value == null)
throw new ArgumentNullException("value");
PrimitiveSerializer.Int32.WriteTo(value.Length, stream);
foreach (int i in value)
PrimitiveSerializer.Int32.WriteTo(i, stream);
}
}
}}}
If you need the Int32 values packed with the variant encoding used by protocol
buffers, you can simply use the VariantNumberSerializer defined in the same
namespace. This class will read and write numbers in the same way protocol
buffers serializes numbers. Thus if space is an issue, replace the uses of
PrimitiveSerializer above with VariantNumberSerializer for a more compact
storage format.
Original comment by Grig...@gmail.com
on 22 Aug 2012 at 5:57
Original issue reported on code.google.com by
benson.m...@gmail.com
on 20 Jan 2012 at 5:13Attachments: