proposal: spec: introduce structured tags

urandom commented 6 years ago

This proposal is for a new syntax for struct tags, one that is formally defined in the grammar and can be validated by the compiler.

Problem

The current struct tag format is defined in the spec as a string literal. It doesn't go into any detail of what the format of that string might look like. If the user somehow stumbles upon the reflect package, a simple space-separated, key:"value" convention is mentioned. It doesn't go into detail about what the value might be, since that format is at the discretion of the package that uses said tag. There will never be a tool that will help the user write the value of a tag, similarly to what gocode does with regular code. The format itself might be poorly documented, or hard to find, leading one to guess what can be put as a value. The reflect package itself is probably not the biggest user-facing package in the standard library as well, leading to a plethora of stackoverflow questions about how multiple tags can be specified. I myself have made the error a few times of using a comma to delimit the different tags.

Proposal

EDIT: the original proposal introduced a new type. After the initial discussion, it was decided that there is no need for a new type, as a struct type or custom types whose underlying types can be constant (string/numeric/bool/...) will do just as well.

A tag value can be either a struct, whose field types can be constant, or custom types, whose underlying types are constant. According to the go spec, that means a field/custom type can be either a string, a boolean, a rune, an integer, a floating-point, or a complex number. Example definition and usage:

package json

type Rules struct {
    Name string
    OmitEmpty bool
    Ignore bool
}

func processTags(f reflect.StructField) {
    // reflect.StructField.Tags []interface{}   
    for _ ,t := range f.Tags {
        if jt, ok := t.(Rules); ok {
              ...
              break
        }
    }
}

package sqlx

type Name string

Users can instantiate values of such types within struct definitions, surrounded by [ and ] and delimited by ,. The type cannot be omitted when the value is instantiated.

package mypackage

import json
import sqlx

type MyStruct struct {
      Value      string [json.Rules{Name: "value"}, sqlx.Name("value")]
      PrivateKey []byte [json.Rules{Ignore: true}]
}

Benefits

Tags are just types, they are clearly defined and are part of a package's types. Tools (such as gocode) may now be made for assisting in using such tags, reducing the cognitive burden on users. Package authors will not need to create "value" parsers for their supported tags. As a type, a tag is now a first-class citizen in godoc. Even if a tag lacks any kind of documentation, a user still has a fighting chance of using it, since they can now easily go to do definition of a tag and just look up its fields, or see the definition in godoc. Finally, if the user has misspelled something, the compiler will now inform them of an error, instead of it occurring either at runtime, or being silently ignored as is the case right now.

Backwards compatibility

To preserve backwards compatibility, string-based tags will not be removed, but merely deprecated. To ensure a unified behavior across libraries, their authors should ignore any string-based tags if any of their recognized structured tags have been included for a field. For example:

type Foo struct {
    Bar int `json:"bar" yaml:"bar,omitempty"` [json.OmitEmpty]
}

A hypothetical json library, upon recognizing the presence of the json.OmitEmpty tag, should not bother looking for any string-based tags. Whereas, the yaml library in this example, will still use the defined string-based tag, since no structured yaml tags it recognizes have been included by the struct author.

Side note

This proposal is strictly for replacing the current stuct tags. While the tag grammar can be extended to be applied to a lot more things that struct tags, this proposal is not suggesting that it should, and such a discussion should be done in a different proposal.

ghost commented 4 years ago

If libraries om SQL, yaml and other field tag heavy domains can each agree to have a common, well maintained package of tags, I would be happy (though not sold on it, for dependency is still dependency, well, I known I am stubborn). But from what I see on the net, this is hard.

If it is just type check, I think linter and unit tests can do its job (with the help of package author). But there's need for writing them more easily (I can't think of a situation that I would require a tool to help writing field tags, but probably it's just me), I do not know enough if gopls can dip into this (again, with help of package author).

My uneducated guess is that if auto completion can be cross packages, surely can it be done in field tags, though it still requires library author to impose some kind of rules to address desired tags (and type of it), perhaps in the same way as proposed in this proposal, with some environment variable pointing to the library in use (without importing it).

On the other hand, if the tag is so complicated to write by hand (using tools, not just need type checking), perhaps the logic is sophisticated enough for its own code instead of hiding in tag fields?

On Sun, Jun 21, 2020, 4:19 AM Antonenko Artem notifications@github.com wrote:

How about actually not allowing packages with tag definitions to contain anything else? That not only encourages but forces people to define tags in very small separate packages that can be imported everywhere. Compiler doesn't even need to compile those. It only needs to parse them and use the information for syntax checks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/23637#issuecomment-647041270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAYXPSNXX347LG5UIUIRADRXUKTTANCNFSM4EOQ7OTA .

creker commented 4 years ago

@leafbebop why do they need to write common package? SQL, yaml, json, they all need completely different tags. They should be in separate packages. And if a package is not allowed to contain anything else apart from tag definitions you don't pay the usual price for the dependency. This is eliminating the main problem - transient dependencies. You can treat this tag package as a tag protocol like you described earlier.

if the tag is so complicated to write by hand

Even with simple tags like json you can make mistakes. Gopls recently started giving warnings about json tags and caught a few typos in my code. With libraries like gorm things get even worse.

ghost commented 4 years ago

@creker Sorry if I did not explain clearly.

@leafbebop https://github.com/leafbebop why do they need to write common package? SQL, yaml, json, they all need completely different tags.

I meant a common sql package and a common yaml package and so on. Especially in sql case, it is not uncommon to have multiple version of sql library using a same data model.

I am aware of the current way of writing tags is prone to errors, and thus I do propose solve these problems by linters (just like how gopls did it for you) and by unit tests (asking the package author to provide a utility to validate field tags).

I was exploring what would be still too complicated with these build time (slightly different than compile time) type check and error message, and claimed that they might deserve their own logic of code.

if the tag is so complicated to write by hand

Even with simple tags like json you can make mistakes. Gopls recently started giving warnings about json tags and caught a few typos in my code. With libraries like gorm things get even worse.

edit: grammar and format.

urandom commented 4 years ago

@leafbebop This is a bit anecdotal, but I've noticed that we never add tags to a struct, unless the primary user plans to use the package that requires these tags in the first place. In terms of dependencies, if the tags were structured, no extra dependencies would be added because of them, since the dependent package is already used (i.e. the struct is already decoded via yaml, or a db, in the primary user / same package). For us, we would also not be adding extra dependencies due to tags due to secondary users (users of the struct in a completely different package/module), since we don't like adding tags that aren't used by the primary user itself. That is to say, if the primary user of the struct isn't using a package that defines some tags, those tags aren't being added in the first place. We always copy the structs and add different tags, since we don't want to couple structs to such packages unnecessarily, even when the tag is just a string.

So, at least for our shop, we would never see the problem you are describing.

On the subject of linters, such a solution would not be able to provide completion of non-structured tags. One would have to hard-code the tags into gopls, which is unfeasible for non-stdlib tags.

ghost commented 4 years ago

@urandom I don't know about your code base, as well as many other's so I really can't comment on that. In my experience, however, it is pretty common to see data, and basic domain-specific data behaviour separated from decoding/encoding/storing logic. The data defined often describe a few requirements in tag fields, but not specific to any implementation of them.

I am not certain how much of the user base me or your shop can count for, but that's some concern.

As for linter, I can't see how that is impossible (maybe I should read into gopls soon). I am not saying that this is feasible for now, but saying that if a package defined typed tags they supported, with magic comments or other proposed or conventionalized protocol, a linter can read about them, and provide completion of non-structured tags according to that info.

On the other hand, I am curious to see a use case that type-check [1] would not suffice, and does not deserve its own logic, but auto-completion and other gopls features would help largely.

[1] which can easily be done by unit tests, with the package provides a validate facility, probably helped by some "x/tools/" utility - again this requires change of the package, but so does the proposal.

ghost commented 4 years ago

And another note: while this proposed spec change can be done in a backward compatible way, I would see many tools that built on purely unstructured tags (of other packages) would be broken, for people will rarely keep both unstructured tags and structured tags (and the current proposal does not support keeping them both).

I am not saying there a significant amount of tools reading on unstructured tags out there (or not, I am not sure about that), but that's another concern, if we are doing this.

urandom commented 4 years ago

@leafbebop The proposal only wants to deprecate the string-based tags, not remove them, as that would be backwards incompatible.

ghost commented 4 years ago

What I said is 1) the proposal does not mention how to keep both string based and structured field tags and even though there is a way of keeping both, people would rarely do that, and 2) there are some library/tools depending on reading on string based tags that are not defined by themselves (e.g., all linters that reads field tag now), and they are going to be broken, even though the language change is backward compatible.

I hope this times it is explained clearly.

On Mon, Jun 22, 2020, 5:04 PM Viktor Kojouharov notifications@github.com wrote:

@leafbebop https://github.com/leafbebop The proposal only wants to deprecate the string-based tags, not remove them, as that would be backwards incompatible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/23637#issuecomment-647386445, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAYXPXQDQHICQ23GVB2GELRX4NA7ANCNFSM4EOQ7OTA .

Merovius commented 4 years ago

FWIW, my rough design would make it relatively simple to keep both (at least in an intermediary period). StructField.Tag could refer to the first string-typed tag, formatted in the old way. A new field, Tags []Value could be added, that lists all tags. New packages defining tags would only use the latter and select those of the appropriate type (a helper method func (StructField) TypedTag(Type) Value could be added for convenience). Packages like json, that want to stay backwards-compatible, would look in both to decide what to do.

That would enable a transition from

type Foo struct {
    Foo           int     `json:"foo,omitempty" bson:"foo"`
    Bar           float64 `json:"bar"`
    EmbeddedThing         `json:"-"`
}

to

type Foo struct {
    Foo           int     json.Name("foo"), json.OmitEmpty, `bson:"foo"`
    Bar           float64 json.Name("bar")
    EmbeddedThing         json.Omit
}

This would still work as expected with the updated json package and the (presumed pre-structured) bson package for both versions. Of course, the second version doesn't work with a pre-structured json-package - the assumption is, that an appropriate go version would be specified in go.mod, when making this change.

(Not commenting on the import-issue, because as I said, I tend to agree it's a problem to be solved and IMO that's everything I can say about it for now)

ghost commented 4 years ago

@Merovius

By meaning keeping both, I am not referring json.Name and bson:"foo", but rather json.Name("foo") and json:"foo". And yes, it seems like in your pseudo proposal it can be kept but it is awkward (as it should be), and I think most people would not really do it. Even if, bug of inconsistency between the tag field redundancy is easy to make and hard to catch.

This will cause code that reads on string field of json:"foo", but not part of the json library (at least the imported json library) would break.

This might be very rare and impractical case, but I think unofficial implementation of json package would suffer that. Not it is hard to fix, or I'm sure that is going to have an impact, but another concern.

bminer commented 4 years ago

Since tags are essentially a string and structured tags are essentially []interface{}, I feel like it would be relatively easy for a package to support both, especially if string-based tags are already supported and actively being parsed by the package. Deprecating the unstructured tags seems like the right approach since structured tags provide many more benefits, and this is probably the road most packages will follow.

One unanswered question is... what should happen if a struct field tries to use structured and unstructured tags at the same time? Surely this will happen as libraries slowly migrate to structured tags. Should conflicting options in the structured tag override the string-based tags? Should the library make this decision? Should there be some sort of convention? If so, what?

urandom commented 4 years ago

@bminer Since string tags would be deprecated, the ideal solution would be for libraries to consider the structured tags first, falling back to string tags when the former aren't available. Of course, this isn't enforceable, so its really up to the library itself.

bminer commented 4 years ago

@urandom - I agree, and it's probably not enforceable. Just pointing out the corner case that might warrant a new guideline / convention -- perhaps even something that is checked by a linter. Should a library completely ignore the string / unstructured tags when both structured and unstructured tags are present?

Example:

type Person struct {
  FirstName string `json:"firstName" bson:"first"` [json.Name("fName")]
}

EDIT: Fixed invalid string tag. :)

Does the JSON encoder encode the FirstName field as "firstName" or "fName" above?
What does the BSON encoder do since there are structured tags present?
Suppose a library only supports parsing unstructured / string tags. Can structured tags be "stringified" to address this? Can this be done at compile time?

I think it would help to address these questions in the proposal -- even if it's just a convention that isn't enforceable. Thoughts?

Merovius commented 4 years ago

I think building this idea of stringification in would mostly defeat the point of the proposal. If you stringify the typed tags, that mapping has to be injective, so you don't get collisions. But if you have an injective mapping, why not just use that mapping for your string-tags in the first place? Or to put another way: To be well-defined, the stringification would likely need to include the import path of the type defining the tag - and if the package using it knows that it needs to specify the import path, why not move it over to the structured tagging API while you're at it?

Do any of y'all actually know a case where you'd need to support both simultaneously? ISTM that usually whoever adds the struct-tags will have a certain amount of control (and/or expectations) over what packages that type is used with - and at the end of the day, the syntax is still bound to the package defining the tags, so if a third-party library wants to drop-in replace that package (say, a third-party json encoder) it seems reasonable to expect it to conform to that new API in a timely manner (after all, support for structured tags in json could be known at least 3 months in advance to any third-party replacement, who could then start adding that support themselves guarded by build tags). There'll probably still be some percentage of cases where it makes sense to use both - but IMO it's fine to just expect those cases to put both kinds of tags into their structs and keep them in sync.

At the end of the day, I don't really think the goal here should be to prescribe how packages actually end up performing that move in all eventualities. It should really just be about making a reasonable path possible.

urandom commented 4 years ago

@bminer only the fName structure tag would be used, regardless, since your stringified tag is invalid anyway :D (hence the proposal itself) but if it were valid, the theoretical json library would ideally discard any stringified tags when structured tags exist. E.g.: if reflect.StructField.Tags contains a value that matches any of its types, it would not bother looking at the old reflect.StructField.Tag

bminer commented 4 years ago

@urandom - Ha! Nice catch. I didn't realize the tag was invalid. :) Anyway, I think that I agree with your convention: ignore string-based tags entirely if structured tags exist. This can lead to strange situations, though...

Suppose the BSON library does not support structured tags yet. So we write:

type Person struct {
  FirstName string `bson:"firstName"` [json.Name("firstName")]
}

Then, suddenly the author adds support for structured tags and pushes a minor version release. The code above might silently break because the string-based tag will be ignored (by convention) by the BSON library.

Anyway, maybe this is worth discussing further? Sadly, since Go does not support structured tags and unstructured tags won't be going anywhere, I think it's really important to understand how we migrate between the two.

@Merovius - you are right that stringification can be weird, but current libraries have their own opinions about the namespace already. For example, for the sake of brevity, one writes json:"fieldName", not encoding/json:"fieldName". So, certainly namespace collisions are possible with unstructured tags. Again, I don't want to dive into the weeds too much either here, but it's worth considering exactly how the Go community would move forward (end users and library creators alike) if structured tags became a thing.

bminer commented 4 years ago

As a follow-up to my last comment, perhaps the convention should be:

If a pertinent structured tag is used, all unstructured tags should be ignored.

In the example above, there are no BSON-specific structured tags, so unstructured tags would get processed. If there were 1 or more BSON-specific structured tags, the BSON encoder library would ignore all unstructured tags. I also like this approach because linters can warn users when both tag types are being used (at least for commonly-used libraries like encoding/json).

Still, using the word "pertinent" makes for rather loose language, and as a computer scientist, I don't like it much.

urandom commented 4 years ago

@bminer, with your bson example, your code would have to contain a single bson.Tag() for the bson library to start ignoring string tags. It's not enough that it contains any tags at all. So you, as a writer, will have to make the change yourself as well.

Merovius commented 4 years ago

@bminer my point was exactly that you are diving too far into the weeds. What to do if there are both is exactly relevant, if a) there have to be both and b) if they have differences. IMO a) won't be the case in >99% of use-cases and thus for b) it's fine to rely on "don't do that then".

IMO you are complicating the design enormously for very little benefit - trying to address something that I believe to be entirely uncommon. And not just that, IMO the case you worry about even gets worse. If there are some rules about the compiler sometimes rewriting tags or them sometimes being ignored, then if someone sees that their encoding-process (or whatever) doesn't work as expected, they need to worry about and try to understand those rules too. If there are simply two APIs, one for typed and one for string tags, then thinking through how to move from one to the other is a pretty simple matrix - does the declaration contain typed/string tags yes/no and does the consumer site use the typed tag API yes/no. If a package wants to move to typed packages, it just has to look at this matrix and decide which cases are relevant to its particular use-case and decide if it's happy with what will happen. If, however, there is some magical system in place to translate from one system to the other, that will have to be understood and taken into account, which will make the decision much harder, what to do. And if some user wonders why their encoding (or whatever) is broken, they also have to understand this system - and users already have problems understanding the current string-tag syntax on its own.

IMO, if you just treat them as separate systems, it is pretty clear what happens when you add typed tags to a struct - code will simply continue to work as before, until it explicitly adopts the typed tag API. And it's pretty clear what happens when you remove string tags from a struct - code that is still relying on the string tag API will break. That's clear, it's simple and it gives a good migration path.

Note, that without any magic there is a pretty clear and obvious strategy for tag-consumers to adopt typed tags that makes the problem you identified vanish: Look for typed tags you are interested in, if you can't find any, use the string-tag. When bson in your example above adopts typed tags, it doesn't matter that there are some typed tags and a string-tag - it will ignore the json tags and not find one of the type it's interested in and thus use the string-tag. Even the case mentioned where you might have independent tag-consumers works fine under this strategy - if there is a struct-declaration that cares about interoperability with multiple such packages, it can just leave both the typed and the string-tag there. Any tag-consumer that knows about typed tags will ignore the string-tag altogether and a tag-consumer that doesn't will ignore the typed ones (obviously).

bminer commented 4 years ago

I think we have all reached the same conclusion: a library should ignore all string tags if a relevant structured tag exists.

@urandom Perhaps this should be included in the original proposal for clarity?

leaxoy commented 1 year ago

Any progress? This issue has been silent for three years.

deefdragon commented 1 year ago

One thing that I don't think I have seen addressed in this so far is that of multiple tags in the same package. The current string tags allow for using different tags in the same way. The specific example I know of is the go-playground/validator package. I use this feature because I have some cases where I need to validate parts of a struct in one way, but other times need to validate it differently. (specifically the creation path vs the update path have different requirements.)

urandom commented 1 year ago

@deefdragon Could you give a specific example with such usage?

deefdragon commented 1 year ago

type User struct {
    //Other assorted variables above and below
    PaymentStatus string `validateCreate:"oneof=GoodGraces" validateUpdate:"oneof=GoodGraces LatePayment ChargeBack"`
}

This is just a one field example, but in this example, the User struct can only be created if the PaymentStatus is set to GoodGraces (preventing forgetting it or using the incorrect value for a new user), while it can be updated to one of the three items listed.

This is what a create might look like.

    //create the validator somewhere (likely in program init, but also possible here
    vc := validator.New()
    //set the tag to the expected tag
    vc.SetTagName("validateCreate")

    //... other setup etc.

    //... create user object, set defaults, update data based on provided inputs etc.
    u := User{}

    //check for errors in user creation somewhere down the line
    err := vc.StructCtx(ctx, u)
    if err != nil {
        //handle error
    }

Giving it some thought, I suspect that a Name/Key/Tag field in the structured tag would be fine to differentiate, and could be set by any library that needs it, but I thought it was important to acknowledge the current lack of discussion around custom tag names.

DeedleFake commented 1 year ago

You could probably do something like

type User struct {
  PaymentStatus string validator.Create("oneof=GoodGraces"), validator.Update("oneof=GoodGraces LatePayment ChargeBack")
}

Then, instead of setting the tag name to check, you could do something like

vc := validator.New()
vc.SetMode(validator.ModeCreate) // ModeCreate could be a function that gets the correct tag.
// ...

Merovius commented 1 year ago

We could also allow multiple typed tags per type. Allowing you to do

type User struct {
  PaymentStatus string validator.Create("GoodGraces"), validator.Update("GoodGraces"), validator.Update("LatePayment"), validator.Update("ChargeBack")
}

The tags would be exposed by reflect as a slice, so the validator package could iterate over them and interpret multiple tags of the same type however it wants.

More complex structures are not possible to express (so you couldn't also have a concept of "and", only of "or", for example). But I also don't think we'd want that anyways. TBQH in my opinion what is already done via struct tags is too "magic" anyways.

matt1484 commented 1 year ago

not sure what the correct final solution should be, but my library uses structs to validate struct tags, so something to consider or at least a workaround for now. https://github.com/matt1484/spectagular/

BenjamenMeyer commented 1 year ago

I've recently gotten into doing some custom marshaling/unmarshaling using struct tags in Go1, and noticed that there is a lot of commonality between the two sides when it comes to parsing the structure. Perhaps the solution here is to define a format for the string and then provide additional capabilities in Go itself so folks don't need to do all the heavy lifting to interact with struct tags. An API like the following could work:

// field - the type data and field information
// value - the instance data
// parameters - a list of the strings attached to the struct tag, interpretation of each value is up to the handler
type FieldHandler(field *reflect.StructField, value *reflect.Value, parameters []string) error

// v - any value
// tagName - the name of the struct tag to parser (json, bson, custom name, etc)
// handler - instance of `FieldHandler` that will perform the desired action (marshal, unmarshal, validation rules, etc)
func StructTagParser(v any, tagName string, handler FieldHandler) error

StructTagParser would then parse the type data to find the struct tags, decode the tags against the specified format, and call the handler for the implementing the final functionality.

The compiler would then be able to enforce the formatting of the Struct Tag string data, however it would not be able to validate the internal meanings. So json:"fieldName,omitempty" bson:"fieldName" would be able to checked such that json and bson and then fieldName,omitempty and fieldName portions are valid formats; however josn:"fieldName" wouldn't be able to be detected that json was mispelled as josn.

This would maintain backward compatibility since the struct tag data itself wouldn't be changed; while adding some additional functionality that makes using struct tags easier to work with and putting some additional guidelines on the format. The only part that might break is if something utilized the string data differently than expected; however, that should likely be pretty low impact that could be negated with documentation and notices.

trim21 commented 4 months ago

this can be included in go 1 actually, this won't break existing go code.

golang / go