dbolin / Apex.Serialization

High performance contract-less binary serializer for .NET
MIT License
86 stars 13 forks source link

Is this serializer meant to be deterministic? #182

Closed Zoxive closed 2 years ago

Zoxive commented 2 years ago

This is way longer than I wanted.. (Thought it would be useful to be a little verbose as im still digging into this) the main question I have is how suspicious is it that the serialized byte count is different with the same input?

Library Version Used & .NET Runtime Version:

.NET 6 w/ 4.0.3

Soo.. I'm looking into another issue we are having.

The problem with this one is it is truly random. We have in-depth integration tests and on my machine after running the tests over and over... randomly we get exceptions (sometimes it takes 5 attempts, sometimes 20) from the serializer that it cant cast object A to B.

(In case you are curious the exception looks like the following.. but its all inside the generated expressions.. and specific to our Types so its not really helpful)

System.InvalidCastException: Unable to cast object of type 'Engine.DataStructures.Values.SimpleValue' to type 'Engine.DataStructures.Values.Value'.
    at Apex.Serialization.Read_System.Collections.Immutable.ImmutableSortedDictionary`2+Node[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Engine.DataStructures.Values.Value, Engine.DataStructures, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]](Closure , BufferedStream& , Binary`2 )
    at Apex.Serialization.Binary`2.ReadSealedInternal[T](Boolean useSerializedVersionId)
    at Apex.Serialization.Read_System.Collections.Immutable.ImmutableSortedDictionary`2+Node[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Engine.DataStructures.Values.Value, Engine.DataStructures, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]](Closure , BufferedStream& , Binary`2 )
    at Apex.Serialization.Binary`2.ReadSealedInternal[T](Boolean useSerializedVersionId)
    at Apex.Serialization.Read_Engine.DataStructures.Values.Collection(Closure , BufferedStream& , Binary`2 )
    at Apex.Serialization.Binary`2.ReadInternal()
    at Apex.Serialization.Read_Engine.Stateless.State.ExecutionStateRoot(Closure , BufferedStream& , Binary`2 )
    at Apex.Serialization.Binary`2.ReadSealedInternal[T](Boolean useSerializedVersionId)
    at Apex.Serialization.Binary`2.ReadObjectEntry[T]()
    at Apex.Serialization.Binary`2.Read[T](Stream 

One of the first things I noticed is that the size of the bytes the serializer is different than another with the same input. (I need to check if this is 100% the same input.. its at least the same build up objects.. its all from Test Builders) So my question is, is this serializer deterministic? What would make it not be?

I'm still digging into my actual problem, but I thought it was weird that the size of the bytes produced is sometimes different, and when the size is different than others it usually explodes deserializing.

Being that this is random is difficult for me to create a small reproducible set.. im still trying though

What ive determined thus far

The exception originates from reading the data in the stream while trying to pull out an already loaded reference via (https://github.com/dbolin/Apex.Serialization/blob/3a934752b6f196d3bd1811c138ad5f034713fc8d/Apex.Serialization/Internal/DynamicCode.cs#L645) but its the wrong type. (The refIndex used to read the LoadedObjectRefs is off by one)

I think the problem is on the writing side and not reading side so far just based on the # of bytes written is different.

Zoxive commented 2 years ago

Here's a picture my sizes as to what gives me pause.

These are all the same IntegrationTest with the same inputs. image)

All of the ones that are size 20777 work. The one that is 20770 explodes.. and so do both of the 20901. Same error message.

Zoxive commented 2 years ago

Is this related to:

Objects that use randomized hashing or other runtime specific data to determine their behavior (including HashSet<>, Dictionary<,> and their immutable counterparts) unless you specifically use a comparer/objects that don't have that randomization

dbolin commented 2 years ago

is this serializer deterministic? What would make it not be?

It should be in the sense that objects which have the same binary data, other than object references, should produce an identical byte array as output (ignoring custom serialization here, and assuming the same code and running the same version of the .NET runtime and the same chip architecture).

There are some ways you can break this, for example if you mutate the object while it is being serialized. However, I don't think that would cause this particular error - this should only be able to happen when the type definitions used by writer and reader don't match (which should be caught by serialized version id check).

Normally I would say this is very likely to be a bug in the serializer, but even if, say, the writing code generated is different like writing field A before field B and the reader expects B then A, I'd expect the final byte count to be the same. In addition, the serialized version id check is implemented by an order dependent hash of the write expression tree, so it should also fail if the generated writer code was ever different.

So at the moment I don't have a good guess as to what is happening here. I still think it's likely to be a bug, but can't think of a way for this to happen. I'm also out for about 2 weeks, so probably won't be able to do any testing on my side.

Is this related to: Objects that use randomized hashing, etc.

It shouldn't be, ISD doesn't have things like that, and even if it did, the object should just misbehave at runtime, not cause this type of error.

I think the problem is on the writing side and not reading side so far just based on the # of bytes written is different.

Agreed. Were you able to try running against the debug version of the serializer to rule out buffer overflow (although I don't know how some cases would end up with more bytes written if that was happening)?

dbolin commented 2 years ago

Do you know if the different byte size outputs were created by the same or different instances of the application? I.e. did the same running application create two outputs with different sizes or does this only happen if the app is running on a different machine or restarted, etc.

dbolin commented 2 years ago

There's also a possibility that there is a bug or misuse of RecyclableMemoryStream that is corrupting the output stream. If it's fairly easy to run the integration tests against custom code you could try with a plain MemoryStream instead of the pooled ones to rule this possibility out.

Zoxive commented 2 years ago

Thanks for your input I'll keep digging.

Yea I removed the conditional attributes so I don't think it's buffer overflow..

What's weird is my current best reproduction is a for loop up to 1,000 times serializing the same thing over and it eventually fails. It's gotten up to 900~ iterations but it does fail reading on the same machine that wrote it.(I'm reproducing on my desktop)

The next thing I want to do is inspect the bytes it creates when the size isn't the same.

Zoxive commented 2 years ago

I removed compression already I can try removing the memory pool.

dbolin commented 2 years ago

Another thing I'd try is adding <TieredCompilation>false</TieredCompilation> property to the csproj just to make sure there isn't some strange interaction between that and any of the generated code. https://docs.microsoft.com/en-us/dotnet/core/runtime-config/compilation

dbolin commented 2 years ago

Yeah, finding the point of divergence in the output bytes could be really helpful for narrowing down where the problem is.

dbolin commented 2 years ago

Any update on your side?

Zoxive commented 2 years ago

Yea sorry I've been on vacation the last week and return tomorrow. Ill try and update what I've learned tomorrow. I still believe there's an issue but it continues to be complicated.

Zoxive commented 2 years ago

I made one more effort to reproduce the problem outside of our project.

And doing so.. I think i discovered the issue as a bug inside GetHashCode() in one of the objects being serialized.

Fixing that seems to have fixed my problem.