Closed JasonBock closed 9 months ago
Yep, this seems to make sense. I suppose there are some additional nice to have features - either now, or as separate issues at a later date:
It feels like deciding on what the required serialization behaviours are is the most tricky part. I'm not saying that the code will be simple, but it's exactly the sort of thing that source generators are made for, so I imagine this to be perfectly feasible.
Question: in the serialization strategy suggested, is there anything to be considered about endianness across different .NET platforms? I'm not really good on multi-platform scenarios, but is Linux little-endian whereas WIndows is big-endian, for example? Is that relevant in .NET, or does .NET define that for us so that we can safely ignore it?
I did also wonder whether differences in code pages used on different platforms were relevant, but I think the answer is no - .NET uses Unicode for strings.
I've spotted an interesting GitHub repository that might offer a useful reference for the source generator, once we've determined what it needs to do:
https://github.com/RudolfKurka/StructPacker
Found from a list of source generators being compiled at https://github.com/amis92/csharp-source-generators
I don't know the provenance of the repos to which I have provided links, so don't assume that they are the ideal reference materials. There might be better examples out there.
Source generators cookbook from the Roslyn repo: https://github.com/dotnet/roslyn/blob/main/docs/features/source-generators.cookbook.md
The endianness question would affect CSLA right now. It wouldn't be a source generator-specific thing.
FWIW I have three repos in flight that use source generators. They have nothing to do with serialization but maybe they'll help pick up a couple of things related to generators.
The endianness question would affect CSLA right now.
When using the binary reader/writer. Today, at least, the easy workaround would be to switch back to a text-based reader/writer.
My point is, this serialization idea really isn't doing anything different from what MobileFormatter
does right now. So MobileFormatter
would have the same issues if a binary reader/writer is being used.
I understand, though the design for this new model should probably rely on a configurable reader/writer concept as well so it is possible to address the issue if it arises for someone.
I would assume that's the case currently in CSLA?
Yes, MobileFormatter relies on a reader/writer that can be swapped out. The default is a binary reader/writer, but there's also an XML (?) reader/writer that wouldn't have endian issues b/c numbers transfer as text.
For reference, here is the announcement about BinaryFormatter being obsoleted in .NET 5. It was done because of the security risks associated with the deserialization implementation. These security issues cannot be resolved, because they are baked into the design of what the deserialization does.
https://docs.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide
We need to be mindful of the risk in implementing the new serialization strategy, although I think the intended solution probably sidesteps a lot of the issues.
My understanding is a bit limited at the moment, but I think one of the main problems will be that BinaryFormatter will deserialize any type using the type information passed as part of the serialization stream. This is done in order to support polymorphic deserialization, but means that clients can effectively inject new types into a host application.
One of my suggestions was that we support polymorphic deserialization, so we need to be careful about how such a thing might be implemented. The issue is not that it cannot be supported, but instead that it should only be supported for types declared in advance within the code performing deserialization. A good example of this is DataContractSerializer, which requires the definition of known types for polymorphic deserialization scenarios, so that there is a list that can be used to validate the types that are to be deserialized prior to an attempt to do so.
Having thought about it some more, I think the serialisation strategy would be most useful if it fulfilled the following rules:
I've done a bit of proof of concept work on this feature this afternoon.
As this is only PoC, it's in my own repo. Have a look at the following location and see what you think of it so far.
https://github.com/TheCakeMonster/Examples/tree/master/SourceGenerators/CslaSerialization
There are 4 projects:
The implementation is very limited. This is not intended to be ready to ship. It's something to kick the tyres of what might be required, intended to prompt further discussion.
The approach I've taken is to automatically implement IMobileObject. However, it only works for properties of primitive types, and therefore does not cope with children/complex graphs.
I've done a couple of improvements to the code this morning to exclude properties that are not exposed outside of the class, and then be able to opt in properties of those lower scopes. However, I then started getting CodeLens throwing NullReferenceExceptions in VS2019 16.10 Preview 1 when viewing the target class (i.e. CslaSerialization.Objects.Person) in a code window - always 3 exceptions, one after another, when you switched to the Person code window.
I'm not sure if the exceptions were caused by the changed I made, or an update to something I got on my machine at the end of the day yesterday - but the changes in the code might be the culprit.
I tried to track down the issue by slowly commenting out code in the generator, but even when it was all commented out it still happened. Only removal of the [AutoSerializable] attribute fixed it. As soon as I added the attribute back in, it started happening again.
In the end, I removed the reference from the Objects project to the Generators project and it stopped happening. Then when I added that reference back, it didn't start happening again, and now it seems OK with all of the code back in circuit. I hate it when that sort of thing happens! There was something slightly messed up and no matter what I cleaned or rebuilt, how many restarts of VS I did and so on, it didn't clear it.
If you have the same problem let me know and I'll see if I can make it happen again on my machine - but it isn't happening at the moment.
The CodeLens issue is a known one. Seems to happen when you have SGs in play (look here for details).
Here are some observations on the code...
Is there a reason you're introducing a new attribute for serialization? Source generators can key off of any type, and CSLA developers are used to having serialization "just work" via the inclusion of [Serializable]
.
If there isn't a need to extend a type, I'd seal it by default. Note that with attributes this can provide a small perf benefit, though with source generators this is probably not applicable.
I'd use a receiver to find candidates, especially context receivers (check out my receiver for InlineMapping). They can make it easier to look for the nodes that match the condition you want.
If possible, don't do a string name search to confirm that the attribute is the type you want (line 55 in SerializationPartialsGenerator
). You can use a Compilation
object from the SemanticModel
to load a type symbol based on the type's name (again, look at my MapReceiver
for details on this). That ensures that the match is exactly what you're looking for.
Consider using an IndentedTextWriter
to write out the code. It makes it a bit easier to handle indentation than trying to do it via a StringBuilder
.
In AppendSetStateMethod()
, you're including a type name (line 154). The namespace of that type will (probably) need to be included as a using
in your gen'd code.
Be careful of types not defined in a namespace. They'll end up in a "global" namespace, and it may break assumptions you're doing with namespaces (this has bit me with SGs I've created).
@JasonBock Thanks for the thorough and comprehensive review. That will be very useful when I do the actual implementation.
Yes, I specifically chose not to use the [Serializable] attribute as a selector for target classes. There's a lot of existing CSLA code that could be broken if we were too aggressive with our targeting. I chose to use a new attribute that allows this to be a feature that a developer can opt into. It's not intended to be a comprehensive solution to all serialization at this stage - it will not do anything other than primitive types in the first version. Therefore, a developer should make a choice of whether to use it based on whether the limitations are appropriate to the scenario.
I'm not all that happy with the names of the attributes - particularly those used to opt public properties out and private properties (and, in future, fields) in - but I think the need for them is sound. The job of AutoSerializationExcluded could be done instead with the NonSerialized attribute, but I don't know of an equivalent for opting things in, so I chose to create two new ones that were closely related by name.
Ooh, your .NET Oxford talk on source generators just appeared in my YouTube feed. What are the chances of that!
@TheCakeMonster I understand how this is "opt-in", but right now with serialization in CSLA....it just "works". For the most part, you don't have to do a lot to change what serialization is doing (at least I haven't, others may have different experiences). For a long time, the pattern has been, "put [Serializable]
on the class, and you're done". Would you be requiring users to attribute the class and all properties to state whether they are part of the serialization process or not? Just something to keep in mind.
@JasonBock I was trying to allow for customisation of the behaviour in situations where it is required. By default, all public and internal properties are included in serialization. However, under certain circumstances you may want to exclude some of those properties, or possibly include private properties or fields that are not included by default.
@rockfordlhotka @JasonBock Where do you think we should put the new source generator in the codebase? Csla.Analyzers isn't entirely unreasonable, but maybe a new assembly is better? Source generators do not need to be deployed; they are only build time dependencies.
I don't think they should go in the Analyzers project, as this is something different. And also because we don't want to mix dependencies and make our lives more complex.
So probably something else, and I'm open to ideas. Probably should try and follow patterns others are using (if any yet).
Is there any need for version tolerance in the serialized data? If a user adds a property to a previously serialized object, could that cause an issue?
For what it's worth, this serializer I made uses a source generator: https://github.com/ReubenBond/Hagar (and I am integrating it into Orleans directly here, with many improvements: https://github.com/dotnet/orleans/pull/7070)
I would be wary of simply serializing fields one after another with no type information or information which can be useful for version tolerance. Type information is useful for supporting polymorphism. Another issue is the possibility of cyclic references in a serialized object graph.
Polymorphism can be implemented in a relatively safe manner by only allowing creation of Type
instances which are explicitly allowed
@ReubenBond the current serialization models (BinaryFormatter, NDCS, MobileFormatter) are all full-fidelity, and that is a requirement. 100% exact clone of the original object graph, including children with multiple references to the same child.
Right now they are all version sensitive, and of course BinaryFormatter and NDCS are controlled by Microsoft, so that's not going to change. MobileFormatter is part of CSLA, and so could relax the requirement for a version match, with the caveat that it opens up a whole new avenue of runtime bugs for people to troubleshoot. So if we did go down that road, I'd want to make sure there's extensive logging in place so people could easily discover that they created a problem for themselves with mismatched versions.
Such a change - relaxing version parity - is not in the scope of this issue.
I think there are a number of decisions we should make before I undertake the main development, based mainly on unresolved discussion on this thread. Here is a list of the outstanding queries I have about what we want this to do.
In summary, what the generator will do is to generate a partial class that implements the IMobileObject interface, so that the consumer class does not need to.
General Questions:
CSLA-specific questions:
Regarding the CSLA-specific questions:
IMobileObject
so they participate in the process.General questions:
Csla.GenerateSerializationAttribute
or something like that? [GenerateSerialization]
IMobileObject
, because that allows MobileFormatter to do the heavy lifting - and it already does it correctly and with some nice optimizations[NonSerialized]
- but applied to the property instead of the fieldNonSerialized
- kind of what's proposed via reflection in #2148@rockfordlhotka Thanks for the responses, that is very helpful.
The answers to the CSLA-specific questions give me everything I need, so we can focus all of our attention on the general questions from this point on.
Here are my thoughts on your responses to those general questions:
My intention with this task was to provide serialization for classes that do not inherit from the base classes. A POCO that needed to be passed via a Command object is the specific scenario I encountered myself, as I was writing a list of log entries to a database in a batch.
I'd be happy with another attribute name, although including the word Generate does perhaps limit its scope to this usage scenario. I was wondering whether a naming convention that would work in more, future, scenarios, was what we wanted. If we come up with a new serialization scheme in the future, might it be nice that people not need to change the attributes on their classes? Having said that, maybe any future scheme would be opt in as well, so maybe this consideration is ill-conceived?
With my question I was trying to tease out all of the detail of the rules for what should be included and what should be excluded. It seemed easier and more intuitive for a developer to know it was the same as some other, industry-standard scenario - and less for us to type in defining what we want. However, we don't have to mimic anything else if we think we know what the rules are. A default behaviour of including public members and excluding private ones makes sense to me, but this itself perhaps conveys insufficient detail. What about properties with a scope of internal? Are there restrictions on whether getters and setters are both of the same scope? There will need to be a setter for the generator to create valid code, but a setter of any scope would work I think, for a partial class.
I will be generating the implementation of IMobileObject and users will not be able to change that implementation - the output from a source generator cannot be edited. For that reason, we need to give developers clear guidelines on what they must do for their objects to be compatible with the new scheme. Being able to refer to the documentation of another scheme seemed like a useful additional benefit.
I'd be comfortable using (indeed I would very much like to use) [NonSerialized] if there were an equivalent for the opposite - for requesting inclusion of private fields and/or properties. I think the logical opposite (covered by item 5) is both appropriate and necessary, but I'll cover that next.
This is about customisation of the serialization behaviour. I can see (and have personally encountered) scenarios where people will need to customise serialization. We could require them to manually implement IMobileObject entirely if they encounter any such scenario, thereby not concerning ourselves with them. However, since it seems to be relatively simple to do, I thought allowing them to customise things using attributes would be a nice feature.
Here are some scenarios I have encountered where manually including private members into serialization could be necessary:
i. An object exposes a public property that only has a getter, but whose underlying value changes based on the calling of one or more methods. Take for example the state of an object whose lifecycle is to be very carefully controlled, and implemented using a private backing field. In this scenario you would have to mark the public property as excluded and the private backing field as included in serialization instead. ii. An object that exposes a read-only property of the time and date it was created, implemented using a private, read-only backing field. iii. If an object must carry with it a timestamp, row version or other value for managing concurrency; this must not be editable and should probably not even be publicly exposed at all, but must be serialized and carried as part of the object. This one might make less sense, in that it suggests an editable object, and editable objects would probably always benefit from inheriting from a base class, and making use of all of the other functionality that FieldDataManager offers. iv. An object which uses a private list to manage deleted items may need customisation. However, this is different in the sense that this should affect the behaviour of the OnGetChildren method rather than OnGet. As I haven't implemented that method yet, this might somehow be less relevant than I am currently imagining, as I have no code to refer to. The scenario also implies it is likely to be an editable object, so this might also make less sense than the first two scenarios.
In short, I am imagining the need to manually request inclusion of one or more private fields or properties in the generated serialization code. The code created by a source generator cannot be edited, so if customisation is to be possible, we must provide a mechanism to achieve it in what we build. I intended to do this straight away; whilst we could delay it for later, it fitted in my head as part of the same development task for better efficiency.
I've run a test this morning, and confirmed a suspicion I'd had - that the backing fields of auto-implemented properties are not available in a source generator. Source generators use the code as it exists as their source of information, and auto-implemented properties are a compiler trick (which is likely to happen after source generators have run) that therefore makes the backing field unavailable to us in a source generator.
Serializing a class that has auto-implemented properties can't be done by serializing only fields.
BinaryFormatter's serialization (which we are trying to mimic) happens at runtime, by which point the backing fields have been created by the compiler.
At the moment, it's not entirely clear to me how you would identify a property as being auto-implemented, so treating these differently from other properties might be hard. I'm not saying that it's not possible to identify auto-implemented properties, just that I don't know how to do so just yet.
Ah, OK. It looks like identifying auto-implemented properties might not be too difficult; PropertyDeclarationSyntax exposes a list of accessors for a property, and auto-implemented properties have accessors whose bodies are null. Properties whose get accessor is null is one way to identify them, although maybe there being 2 accessors and the bodies of both being null might be the most accurate mechanism of all.
The underlying writers and whether they support typed writing is relevant to the discussion of avoiding boxing and unboxing of values types. Here is a list of the writers, and what support they have for typed writing:
CslaLegacyBinaryWriter and CslaBinaryWriter use BinaryWriter: ushort, uint, ulong, sbyte, bool, byte, short, int, long, decimal, float, double, char, char[], byte[] (in summary: fairly comprehensive) CslaXmlBinaryWriter uses XmlDictionaryWriter: bool[], int[], long[], float[], double[], decimal[], Guid[], TimeSpan[], DateTime[] (in summary: a bit patchy, and all for arrays; does this negate the benefit?) CslaXmlWriter uses XmlWriter: bool, int, long, decimal, float, double, DateTime, DataTimeOffset (in summary: quite limited)
There is some support in the underlying writers for typed serialization, but it's somewhat incomplete. There are advantages to be had for some of the most common value types in most circumstances.
Does anybody faced issue with "losing" Kind
of DateTime
object after serialization?? Before serialization had Utc
kind, but after Unspecified
...
Info about Microsoft doing something with generation and JSON serialization.
https://devblogs.microsoft.com/dotnet/try-the-new-system-text-json-source-generator/
@rockfordlhotka could you tell me please, does MobileFormatter
supposed to work with DateTime
objects without Kind
property, during serialization/deserialization flow or is just forgotten? Cause, based on code, CslaBinaryWriter
writes only ticks:
case TypeCode.DateTime:
Write(CslaKnownTypes.DateTime, writer);
var value = ((DateTime)target).Ticks;
writer.Write(value);
break;
And CslaBinaryReader
re-creates DateTime
object based on provided ticks, without Kind
:
case CslaKnownTypes.DateTime:
return new DateTime(reader.ReadInt64());
I'm re-working our dates manipulations and wanted to send from API
dates in Utc
timezone, have to know it is a bug or works as planned and will have to add additional logic to convert dates from Unspecified
to Utc
, cause i know that they are stored in Utc
, in database :\
Not planned.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Is your feature request related to a problem? Please describe. Not a problem.
Describe the solution you'd like
Overview
The intent is to start incorporating source generators into CSLA. The first target is serialization - specifically, generating the serialization path to (potentially) improve performance and reduce memory allocations. Note that this would only work for .NET 5 and beyond, so care should be taken as to how "integrated" this is with CSLA. I'm guessing some kind of "opt-in" strategy would need to be in place such that users on older TPMs can use serialization as it currently works in CSLA.
Proposal
The idea is to update how serialization works in CSLA such that the serialization path is baked into code via source generation. The generator would look at the contents of a business object, figure out what properties need to be serialized, and create the necessary calls to the stream. This should provide speed and memory allocation gains for the following reasons:
There are a couple of disadvantages:
How Serialization Currently Works
It's kind of confusing (well, at least to me it is :) ).
MobileFormatter
implementsISerializationFormatter
, but it also has these helper DTO methods that create other instances ofMobileFormatter
. These DTO methods get a list ofSerializationInfo
. At some point,CslaReaderWriterFactory
is used to get either a reader or writer.Source Generator Strategy
The overall goal is to greatly simplify and improve how serialization works in CSLA. Basically start from scratch and build a new serializer based on
ISerializationFormatter
, tentatively calledSourceSerializer
(but call it whatever you want). The difference would be that any business object (or, more formally, anything derived fromIMobileObject
) would have a partial class created that would implement a new interface,ISourceSerializable
. This would have two methods,Serialize
andDeserialize
(note that this may cause a name collision, so we may want to implement this interface explicitly). Here's a brief example. Let's say I defined this object:Let's say it has properties like an
Id
as aGuid
, aName
as astring
, aBuffer
to hold arbitrary byte data (for some bizarre reason), and some child objects.Arguably, we wouldn't need
SerializableAttribute
there, but that's a debate for another time. Note that the class must bepartial
as we are going to expand on the definition ofCustomer
in another class. The source generator would find the definition written by the developer, and create another class definition that will look something like this:The names of the methods I have on the
Stream
might be wrong, but I hope the core of the idea comes through. If you look at the current serialization implementation in CSLA, there's a bunch of Reflection code, "well known types", and other junk that just disappears with this. We know what's being serialized and in what order, so deserialization can assume that happy-path.Next Steps
My first suggestion would be to mock up this idea without doing source generators first. Meaning, handwrite the code that the generator would produce, and do extensive testing (preferably with Benchmark.NET) to determine if this will be a "better" option overall. There's no need to actually do the source generator work if this ends up being a bad idea in the first place :).
If this proof-of-concept work proves to be feasible, then start looking at how CSLA would work with making serialization strategies configurable. I really don't know if this is something that is easy and doable today or not. I'm guessing this is already built-in, but ensure that it is. Meaning, you want to make sure you can allow the source generation serialization to be opt-in and allow older versions of .NET to work with what's there (or let people stick with the current serialization implementations).
Once that's addressed, then start looking at what it would take to do it as a source generator. If you haven't worked with source generators in C# 9 yet, it's a powerful feature, but you have to be comfortable with the Compiler APIs. If you're not, there is a learning curve to address, and don't underestimate it :).
Make sure your implementation does not break existing expectations. There's probably a bunch of things I'm missing with serialization in CSLA. Just looking at the tests (e.g.
TestWithoutSerializableHandler()
, it looks like theGlobalContext
is updated, which is odd to me. I think CSLA also tracks duplicated object references within a graph so that the references are deserialized correctly as one object referenced in multiple places. I don't have all the answers here, just make sure you don't break anything here.Describe alternatives you've considered No alternatives considered.