CMertens / protobuf-net

Automatically exported from code.google.com/p/protobuf-net
Other
0 stars 0 forks source link

Protobuf message order seems to matter for inheritance #305

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Please include an e-mail address if this might need a dialogue!
==============
franchescamullin at gmail.com

What steps will reproduce the problem?
1. This is my trivial example:

namespace MyLib
{
    [DataContract]
    public abstract class Animal
    {
        [DataMember(Order=1)]
        public int NumberOfLegs = 4;
    }

    [DataContract]
    public class Cat : Animal
    {
        [DataMember(Order = 1)]
        public List<Animal> AnimalsHunted;
    }
}

2.
I am then programmatically adding Cat as a subtype of Animal with a field 
number of 2, and Dog as a subtype of Animal with a field number of 3.

3.
Call GetProto() on Cat.

4. Generate Python classes from .proto file

5. Send a cat object from Python to .NET

6. Deserialize in .NET as type Cat

What is the expected output? What do you see instead?

I expect to see a cat, I get an exception while protobuf-net tries to 
instantiate the abstract "Animal" object.

What version of the product are you using? On what operating system?
I am using the revision 552 from trunk

Please provide any additional information below.

I notice that I can deserialize the python serialized cat object (i.e. the 
animal message that contains the cat message using the code generated from the 
.proto definition using protogen 2.4.1) just fine if I send an empty cat 
message followed by the animal message which contains the "real" cat to be 
deserialized. When I look at the bytes sent by Python as compared to from .NET 
to .NET, protobuf-net always sends the bytes for the cat message first, then 
the rest of the message. Is this an artifact of the way you have to use 
protobuf to replicate inheritance, or is it something that can be changed? Is 
there a better work-around than sending an empty message (this won't work for 
all scenarios)?

Original issue reported on code.google.com by Franches...@gmail.com on 26 Jul 2012 at 4:14

GoogleCodeExporter commented 9 years ago
Hi, do you have any update for this issue? :)

Original comment by Franches...@gmail.com on 14 Aug 2012 at 9:35

GoogleCodeExporter commented 9 years ago
I have attached examples of the classes, the .proto file generated by 
GetProto(), and serialized messages from protobuf-net and python (using 
protogen with the .proto file).

The message being sent is a Cat of breed "Manx" with 3 legs (poor little guy).

The message in csmessage.bin can be deserialized by both .net and python.
The message in pythonmessage.bin can be deserialized only by python (.net gives 
an error)

Here is the code I am using on the cs side:

// create and serialize cat object to file
            MemoryStream streamMessage = new MemoryStream();
            Serializer.Serialize<Cat>(streamMessage, new Cat() { Breed = "Manx", NumberOfLegs = 3 });
            File.WriteAllBytes("./csmessage.bin", streamMessage.ToArray());

            // test cs generated message deserialization
            byte[] csMessage = File.ReadAllBytes("./csmessage.bin");
            MemoryStream streamCsMessage = new MemoryStream(csMessage);
            Cat csCat = Serializer.Deserialize<Cat>(streamCsMessage);

            // test python generated message deserialization
            byte[] pythonMessage = File.ReadAllBytes("./pythonmessage.bin");
            MemoryStream streamPythonMessage = new MemoryStream(pythonMessage);
            Cat pythonCat = Serializer.Deserialize<Cat>(streamPythonMessage);

Here is my python code:

import Animals_pb2

if __name__ == '__main__':

    in_buf = open('csmessage.bin','rb').read(10)

    resurectedCat = Animals_pb2.Animal()

    Animals_pb2.Animal.ParseFromString(resurectedCat, in_buf)

    print resurectedCat.Cat.Breed + ' has ' + str(resurectedCat.NumberOfLegs) + ' legs!'

    open('pythonmessage.bin','wb').write(resurectedCat.SerializeToString())

    pass

Here is the strangest thing, I tried changing the protoinclude field number for 
cat so that it is lower than the field number for the NumberOfLegs property in 
the Animal class, and the issue went away because the python and cs generated 
messages generated were identical.

Original comment by Franches...@gmail.com on 17 Aug 2012 at 9:05

Attachments:

GoogleCodeExporter commented 9 years ago
The tests will be handy for validating this, but what you have seen is 
consistent with how I expect the (slightly in need of repair) code to work. 
Basically, the way it currently works is:

when serializing
- it **always** writes inheritance first (this is legal, but slightly atypical)
- it then writes members in ascending field order

when deserializing:
- it lazily creates the object when it finds the first member
- it really really hopes to find inheritance first
- because: if it finds a member first, and *then* finds a member, it doesn't 
currently compensate by rebuilding the object as a higher type

what I need to do is one of:
- increase the member buffering (lots of existing code for this exists from the 
auto-tuple stuff, but it isn't desirable)
- detect this scenario and do a serialize/deserialize-to-a-higher-type (this is 
what v1 does)
- detect this scenario and perform a memberwise-clone directly

Repeating the middle option (as per v1) is probably the most pragmatic option

Original comment by marc.gravell on 17 Aug 2012 at 9:44

GoogleCodeExporter commented 9 years ago
Do you think my workaround (keeping known type field numbers lower than member) 
will be sufficient for all cases?

Original comment by Franches...@gmail.com on 17 Aug 2012 at 11:53

GoogleCodeExporter commented 9 years ago
Yes, that will work consistently and reliably, as long as python respects the 
"ascending fields" guideline (which it sounds like it does).

Original comment by marc.gravell on 17 Aug 2012 at 1:11

GoogleCodeExporter commented 9 years ago
Great, thanks!!! My real system is using DataContract Atributes, which make the 
work around more complex. I had to increment the field numbers by n known 
types, and then insert subtypes into the first n fields. I got it working 
nicely, but please promise never EVER to change the name of the private 
readonly ValueMember.fieldNumber ;)

(I saw a DataContract related offset somewhere, if you could expose that in a 
"per class" way for me to programatically set that might be a safer longterm 
solution?)

Original comment by Franches...@gmail.com on 17 Aug 2012 at 1:51