Support naming union fields from type names where field names are missing

cmeeren commented 2 years ago

First of all, thank you for an excellent library. It has become a staple in all my F# work.

Description

I propose adding an option to name union fields according to the name of the type in that field (specifically, the System.Type.Name property) if the field does not have a name. In other words, where you currently use Item (including those with numbers at the end, as described further below), fall back to System.Type.Name. Hopefully this is easy to implement.

API-wise, this can take the form of a new value JsonUnionEncoding.NamedFieldsTypeFallback (feel free to find a better name) to be used together with NamedFields.

Current and new behavior

Take this DU:

type SendData =
  | Sms of MobileNumber * OneTimeCode * customMessage: string

With the current NamedFields, the fields will serialize as

{
  "Item1": "...",
  "Item2": "...",
  "Item3": "..."
}

With the new option, it should be serialized as:

{
  "MobileNumber": "...",
  "OneTimeCode": "...",
  "customMessage": "..."
}

Current workarounds

To work around the lack of this feature, one has to specify field names for all fields, which IMHO is just unnecessary noise since it duplicates information:

type SendData =
  | Sms of mobileNumber: MobileNumber * oneTimeCode: OneTimeCode * customMessage: string

Alternatively, one can use JsonUnionEncoding.UnwrapRecordCases and use a record type (either a separately defined nominal record or an anonymous record). That is even more verbose (both in the DU/record definition and when constructing).

Edge cases

If multiple fields have the same type (or types with identical names), append numbers (as is currently done with Item1, Item2, etc.). For example:

{
  "MobileNumber1": "..."
  "MobileNumber2": "...",
  "customMessage": "..."
  "MobileNumber3": "...",
}

Motivating use case

This request makes most sense in light of using single-case DU "primitives" as the field values, where the names are likely to be descriptive. I think that is a common enough pattern in F# that this feature will be useful.

I personally want this feature due to my use of JSON serialization in logging. When I log (and serialize) a domain object, they are currently full of DUs with Item1, Item2, etc. Without this feature, I will either have to live with that (possibly having to open the code and see which types the values actually correspond to), or add a lot of duplication by specifying DU field names that are, for all intents and purposes, identical to the type names and thus add unnecessary noise. (Also there's the real danger of forgetting to add field names for new DUs/cases.)

bartelink commented 2 years ago

Random devil's advocate questions that do not require a response, but can be addressed by editing the OP...

Is the fact that the mapped type name is PascalCase an intentional hint for humans, or would it be better for it to get camelCased (or would you look to have a way to parameterise the mapping rule) ?

If this is about strongly typed ids, I personally would use FSharp.UMX, which would have a horribly useless type name (string). Do you have a plan/suggestion where fields named Item* that are typed as primitives have some better handling (maybe warn, throw, don't apply the rule?)

cmeeren commented 2 years ago

All good questions. I will address them in the light of my use-case, which as mentioned is more readable JSON serialization in logs. In other words, only developers will see the JSON, and the exact formatting isn't critical.

The TL;DR is: Keep it simple – it's still a very helpful feature.

Is the fact that the mapped type name is PascalCase an intentional hint for humans, or would it be better for it to get camelCased (or would you look to have a way to parameterise the mapping rule) ?

While in most cases, I'd ideally like camelCase for consistency with the idiomatic style for manual DU field names, I don't think it's a good idea. The feature then suddenly becomes more complicated. You can't just transform the first letter to lower-case, because then you could end up with weird casings like iPAddress (for System.Net.IPAddress). I therefore suggest just using the type name directly (typically PascalCase), which is what I demonstrated above. That's what you'd normally see in records elsewhere in domain objects (including if records are used as the DU field and UnwrapRecordCases is used). If someone really cares about this, they can use manual field names with PascalCase instead of camelCase where applicable, or use a record with UnwrapRecordCases.

If this is about strongly typed ids, I personally would use FSharp.UMX, which would have a horribly useless type name (string).

IDs is one use-case for single-case DUs, but I use single-case DUs for more or less all primitives. I'd say that only 10% of my single-case DUs are IDs. The rest are other domain "primitives" like mobile numbers, email addresses, and much else that all require some form of validation and often guarantee some invariants. I don't use FSharp.UMX, but I'm OK with fields typed as string, int, etc. (with/without unit of measure) ending up as String, Int32, etc. After all, this is an optional feature, and the name can easily be overridden by specifying a field name. I would have to do that several places in my own code, too, e.g. places where I use bool still need manual field names. That's fine. But this feature would take care of 90% of my fields, where I'd just use the type name as the field name (and therefore add a lot of noise/duplication, which is why I currently live with Item* and either remember/guess or have to look up what the data actually is).

Do you have a plan/suggestion where fields named Item* that are typed as primitives have some better handling (maybe warn, throw, don't apply the rule?)

I'm not sure what you're asking here. If it's important for you that I respond to this, could you re-phrase the question? In any case, I believe that my fairly simple suggestion is helpful in light of my use-case. It's just intended to reduce the amount of noise needed for helpful DU field names. It doesn't have to be perfect or cater to every edge case. Also, there's always the possibility of specifying the field name manually, or using a record if you really care about naming and consistency/stability of the output.

bartelink commented 2 years ago

I'm not sure what you're asking here. If it's important for you that I respond to this, could you re-phrase the question?

If you are rendering Range of int * int, then your basic rules would suggest that could become Int321 or Item1 and Int3222 - was just seeking to get you to expand on your rules with the aim of either making them more general or making you question them.

In any case, I believe that my fairly simple suggestion is helpful in light of my use-case. It's just intended to reduce the amount of noise needed for helpful DU field names. It doesn't have to be perfect or cater to every edge case. Also, there's always the possibility of specifying the field name manually, or using a record if you really care about naming and consistency/stability of the output.

I think you've done a great job in the above of conveying what your goals are - I guess the lib is not far off being able to deliver on them, even though it's probably some distance from it's core use cases, which I'll be interested to see the maintainer's views on (IMO, based on working with FsCodec and event rendering, having dynamic rendering that can jump around based on small source changes are the last thing you want in a serialization lib, but absolutely what you're after when trying to render stuff for human dev/ops perusal).

(Am also selfishly secretly wishing there'll be a revival of Destructurama.FSharp, which I believe is where this sort of pretty/sensible rendering of literals for logging purposes could live if only someone had the time and energy to invest! Easier said than done but surely some TypeShape-fu could yield a nice general lib without too much code ;) )

cmeeren commented 2 years ago

If you are rendering Range of int * int, then your basic rules would suggest that could become Int321 or Item1 and Int3222 - was just seeking to get you to expand on your rules with the aim of either making them more general or making you question them.

With my rules, they would be Int321 and Int322. Alternatively we can append an underscore: Int32_1 and Int32_2. They aren't great names in any case, but that's fine by me, because if you want better names, you just specify the field names. This option is just a very simple one (simple both to implement and understand) that would remove the need for 90% of manual field names in my use-case, so it's worth it IMHO.

IMO, based on working with FsCodec and event rendering, having dynamic rendering that can jump around based on small source changes are the last thing you want in a serialization lib, but absolutely what you're after when trying to render stuff for human dev/ops perusal.

It's the last thing you want when you want stable output, but serialization libraries cater to other use-cases, too. Many of the JsonUnionEncoding already make the output fairly unstable under refactoring if you serialize domain types directly. If you want stable output under refactoring, there's no way around having to map to separately defined stable DTOs and serialize them. And chances are that if you do, you use simple records and primitives and don't need most of FSharp.SystemTextJson's features in the first place.

(Am also selfishly secretly wishing there'll be a revival of Destructurama.FSharp, which I believe is where this sort of pretty/sensible rendering of literals for logging purposes could live if only someone had the time and energy to invest! Easier said than done but surely some TypeShape-fu could yield a nice general lib without too much code ;) )

Yep, I wanted that, too. Currently it doesn't work right and uses uncached reflection. My journey to this issue started with me first trying to come up with a new kind of Destructurama.FSharp-type library but butting my head against the myriad of ways unions can be encoded and not wanting to duplicate everything in FSharp.SystemTextJson. Then I figured I could piggyback on top of FSharp.SystemTextJson's capabilities with this IDestructuringPolicy, but limitations on the output format for destructured objects (specifically, the lack of indentation) pointed to manual serialization/formatting (no destructuring) being the only viable option for me for the time being (as mentioned by the maintainer in the subsequent comment).

Tarmil commented 2 years ago

Yeah, I think this is a reasonable feature to add. The output for built-in types would indeed be a bit odd, but since this is opt-in, I'm fine with leaving the user responsible for using meaningful type names.

For the camelCase/PascalCase issue, luckily there's a built-in solution in STJ. The option PropertyNamingPolicy defines how to transform a property name, and we already use it for record and union fields. The default is a no-op but there's a built-in camelCase implementation that is smart enough:

> JsonNamingPolicy.CamelCase.ConvertName("IPAddress");;
val it: string = "ipAddress"

So you just have to use JsonSerializerOptions(PropertyNamingPolicy = JsonNamingPolicy.CamelCase) and all your properties, including those based on union field types, will be camelCase.

cmeeren commented 2 years ago

So you just have to use JsonSerializerOptions(PropertyNamingPolicy = JsonNamingPolicy.CamelCase) and all your properties, including those based on union field types, will be camelCase.

That's interesting! But I guess there is no simple way to use this only for union fields?

Tarmil commented 2 years ago

@cmeeren That could be another new feature. JsonFSharpOptions already takes a naming policy for union tag names; we could add one for union field names, and fall back to PropertyNamingPolicy if none is provided.

cmeeren commented 2 years ago

That sounds like a good way to go about it. 👍

Tarmil commented 2 years ago

This has been added in v0.18.

Tarmil / FSharp.SystemTextJson