golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.2k stars 17.57k forks source link

proposal: encoding,encoding/json: common struct tag for field names #60791

Open sparr opened 1 year ago

sparr commented 1 year ago

Currently if a package wants to define a struct that can be saved to and loaded from files in different formats, with field names different from the struct field names (e.g. changing "FooBar" to "foo_bar" to match conventions, "Miscellaneous" to "misc" for brevity, etc), the package must add separate struct tags for json, toml, yaml, etc. Any encoding not specifically enumerated in the tags will either fall back to using the struct field names directly, or have to implement parsing of another encoding's tag. Any tag options supported by multiple encoders must be specified multiple times.

While these different encoding packages offer some unique functionality, such as go-yaml's inline, encoding/json's string, and go-toml's multiline, they all share common functionality of specifying the key name and the omitempty option. Since go-toml v2, they also all use the same structure for the contents of the tag, i.e. "name,option,option...". For use cases where that subset of functionality is sufficient, it would be convenient if most or all of the markup/encoder/serializer/marshaler/etc packages supported a common tag name.

My proposal is for a standard tag that looks and works like the existing tag syntax for toml, json, and yaml, but with a new name. Something like "markup", "marshal", "encoded", "serialized", etc. Preferably relatively short.

With this proposal, and support by the relevant packages, the following code:

type Platform struct {
    ArchitectureType string `toml:"arch_type,multiline,omitempty" json:"arch_type,string,omitempty" yaml:"arch_type,inline,omitempty"`
    Variant string `toml:"var,omitempty" json:"var,omitempty" yaml:"var,omitempty"`
    // ...
}

might be replaced with this:

type Platform struct {
    ArchitectureType string `marshal:"arch_type,omitempty" toml:",multiline" json:",string" yaml:",inline"`
    Variant string `marshal:"var,omitempty"`
    // ...
}

This new tag would specify the expected behavior of some options, possibly currently only omitempty, which I believe has consistent behavior across all three of the packages mentioned above, and at least most of the other yaml packages.

Each of the packages could still read its own tag, for both unique and common functionality, with the following proposed conflict resolution behavior:

Alternately, packages could read arbitrary options from the standard tag, which would simplify the struct definition even further but risks future collisions between options understood with different meanings by different packages.

The implementation of the functionality to decode this tag could be left to the individual packages, or go in a new part of the standard library possibly somewhere near reflect.StructTag.Get or elsewhere in encoding (possibly the same place that #60770 ends up if we move tagOptions and parseTag out of encoding/json), or may end up in a third party package like https://pkg.go.dev/github.com/fatih/structtag. Wherever it ends up, the conflict resolution described above could also be implemented generically and made available to all consuming packages.

earthboundkid commented 1 year ago

Why not just have the other packages fall back to json: if toml: isn’t set?

sparr commented 1 year ago

@Nasfame Regarding collisions, I have used github code search to search for path:*.go StructTag AND "Get(\"json\")" and equivalent for other tag names. The "code" category results are as follows:

json: 1.2k yaml: 148 toml: 79 markup: 0 marshal: 0 encoded: 0 serialized: 1 (https://github.com/inklabs/rangedb and forks)

sparr commented 1 year ago

@carlmjohnson I did also suggest that on one of those projects. I am taking a multi-pronged approach to this situation. https://github.com/pelletier/go-toml/issues/880

seankhliao commented 1 year ago

cc @mvdan @dsnet

sparr commented 1 year ago

@carlmjohnson The developer of go-toml has said he will use this proposal if it succeeds, but will not use the json struct tag in the main release of his package.

https://github.com/pelletier/go-toml/issues/880#issuecomment-1638654623

seankhliao commented 2 months ago

68361 drew parallels to encoding.TextMarshaler, and suggested text as the name for the tag.

dsnet commented 2 months ago

One advantage of the #68361 is that it simplified the problem to just the textual name, while this proposal cover the name and other common-ish attributes like omitempty.

A name tag alone is easier to reason through, while attributes like omitempty is more challenging if they have different semantics across serialization libraries. For example, the v2 "json" package redefines omitempty for a field to be omitted if it an empty JSON value, but adds omitzero which omits a field if it is the zero Go value.

dsnet commented 2 months ago

While textual names are more common, should there also be support for numeric field IDs? This is useful for formats like CBOR or protobuf that represent fields with numeric integers, rather than textual names.

seankhliao commented 2 months ago

68361 still included omitempty:

The only allowed option is omitempty (although that should be a vet check). Values with - will be skipped as it works currently.

I'd agree it makes more sense to only support the field name and no options. Protobuf seems to require additional info, if numeric IDs are always going to be used in format that require additional metadata, then it may not make sense to try and create additional indirection.

dsnet commented 2 months ago

@adonovan and I once (many years back) tendered the idea of a package that can serialize Go structs as protobuf using just Go reflection (side-stepping the protobuf compiler). All you need is the numeric field ID as the other attributes of protobuf (e.g., whether a field is optional) can be inferred from the type of the field.