dbolin / Apex.Serialization

High performance contract-less binary serializer for .NET
MIT License
86 stars 13 forks source link

Incorrect serialization with Nullable<> + Generics + ValueType (maybe?) #178

Closed Zoxive closed 2 years ago

Zoxive commented 2 years ago

Hey Dominic. I'm still looking into this, but I'm opening this (before I have a full fix PR) incase you know something off the top of your head that would help me diagnose this.

Basically our application has crashed a few times now due to memory corruption. I've traced it down to Apex.Serialization while deserializing some state. It looks to me that it incorrectly serialized that state. (Its pretty hard to debug compiled expressions.. more on that later...)

I think I got it worked down to a small reproducible sample. What is funny is that while serializing in debug it actually catches that its about to write some bad stuff and explodes.

Example exception while in DEBUG mode of the library System.InvalidOperationException Operation is not valid due to the current state of the object. at Apex.Serialization.Internal.BufferedStream.CheckReserved(Int32 size) in C:\Github\Apex.Serialization\Apex.Serialization\Internal\BufferedStream.cs:line 152 at Apex.Serialization.Write_Apex.Serialization.Tests.Option(Closure , Option , BufferedStream& , Binary`2 ) at Apex.Serialization.Binary`2.WriteSealedInternal[T](T value, Boolean useSerializedVersionId) in C:\Github\Apex.Serialization\Apex.Serialization\Binary.Internal.cs:line 628

Current thoughts:

  1. Maybe we could add a Apex.Serialization.Settings property that turns these sanity checks on even in release mode? (I personally would eat the loss perf to check this to better know that the seriation was accurate and wont crash my app again lol)
  2. In my commit the Settings.IsTypeSerializable() has some shadiness to it.. im not sure this actually needed. The version of the library we are using is before the whitelisting.. so im prob doing someting wrong.
  3. My current target area for the problem/fix is around DynamicCode.HandleNullableWrite It is writing a byte 0 when hasValueMethod is actually null is what is setting off the InvalidOperationException
  4. With my work's continuing to use this type of library.. I'm wondering the feasibility of creating a SourceGenerator version of this library. The main point being debuggability for any future problems we may have... with side a side bonus of performance. (I may take a pass at this later this week.. it will help me get into this codebase more as well)

I'll provide updates to this issue as I continue to work on this.

Library Version Used & .NET Runtime Version: Reproduced in 1.3.3 (version we were on at the time) and latest in master. (4.0.2) .NET 3.1 & .NET 6

Steps to Reproduce:

In my branch I added a test showing the problem. My branch/commit: https://github.com/Zoxive/Apex.Serialization/commit/c254a0ae020e217d31da8fad45a29dbc889d3407

Test: https://github.com/Zoxive/Apex.Serialization/blob/c254a0ae020e217d31da8fad45a29dbc889d3407/Tests/Apex.Serialization.Tests/NullableGenericValueTypeTests.cs

I commented out some [Conditional("DEV"]) statements so if you ran it in RELEASE mode you see the problem as well.

Expected Behavior: Dont create serialized state which when deserializing cause memory corruption and crash the CLR

Actual Behavior: Work without crashing : )

dbolin commented 2 years ago

Thanks for the report and investigation. I've pushed the fix to master (4.0.3) - do you need a patch on 1.3.x to support .NET 3.1?

I'll followup on (1) - not all code is generated, but anywhere that is should be able to have a setting at runtime rather than compile time so you can enable the check without losing performance in the case where the setting is off.

For (4), I believe there's a lot of stuff SourceGenerators won't be able to support, such as setting readonly fields without a constructor or reflection, and serializing the shared structure of immutable data types (which can greatly reduce the output size if you keep references to older versions of the data structures).

For debugging, I always just look at the text of the generated delegate, there's currently no good way to step through them.

Zoxive commented 2 years ago

Well yep that fixes it. Thanks for that.

I agree on (4) that reflection will still need to be used.. but at least it would get the main chunk out in the open.

For my debugging i was using https://github.com/AgileObjects/ReadableExpressions nuget package to generate more human readable expression text.