proposal: spec: remove struct tags

bradclawsie commented 7 years ago

This is a Go2 Proposal

Struct tags are one of Go's great hacks, but they are being exploited more and more to embed DSLs into Go that cannot be checked by the compiler.

Struct tags create implicit constructor-style functionality. Go does not require constructors for structs but we could use interfaces to mandate this functionality in specific scenarios (serialization/deserialization, etc).

The alternative is end-runs around Go's simplicity like the validator library, which injects a private DSL that has no compatibility guarantees into the running state of Go programs.

Also since struct tags effectively create the equivalent of eval in Go code, they also represent an attack surface.

bradfitz commented 7 years ago

Obviously we can't remove struct tags from Go 1.x, so I'll throw this into the Go 2 bin for consideration later.

bradfitz commented 7 years ago

Ah, and you retitled it at the same time as my label change.

ianlancetaylor commented 7 years ago

It would help to suggest some alternatives to struct tags for the things they are currently used for, such as the uses in encoding/json.

bradclawsie commented 7 years ago

The functionality exists already with interfaces if we choose to use them more broadly. Removing struct tags does not require other language changes to Go in order to meet the functionality.

Any type that can be serialized or deserialized (not just json, but also yaml, toml, csv, etc) could be required to satisfy an interface available in the standard library. For example:

type Serializable interface {
   Construct...
}

Hopefully the intent is clear. The standard library has a general lack of useful interfaces and this is but one example.

When it becomes very common to inject runtime DSL functionality into a program through a string, it seems fairly obvious that this is alerting us to a deficiency in the language design. We need to figure out how to bring this back into the core language.

cespare commented 7 years ago

@bradclawsie I don't follow how using an interface is a replacement for the specific functionality that struct tags provide for JSON, XML, etc. (specifying field names, omitempty, ...)

bradclawsie commented 7 years ago

@cespare Everything about serialization and deserialization can be moved to real Go functions, including any decisions about fields to omit. Struct tags are just an out-of-language shortcut that permits ad-hoc DSLs to be inserted into Go code.

urandom commented 7 years ago

currently, struct tags are of great help when you want to quickly do things like (un)marshal stuff, among other things. while there are negatives as you listed, relying on interface implementations would introduce quite a bit of strain to developers. imho, a better solution would be to replace them with proper annotations that the compiler can actually check (and maybe such annotations can be expanded to more things than merely struct tags).

E.g.:

type foo struct {
     Bar string `json:",omitempty"`
}

would become:

type foo struct {
     Bar string @json.OmitEmpty
}

At the very least, the compiler can at least see whether an annotation exists or not, and it can come with better documentation as well.

bradclawsie commented 7 years ago

@urandom I really like your suggestion. The usability of the current approach is retained, but because the annotations you propose would actually be Go syntax, the compiler can inspect them (they are no longer ad-hoc DSL vectors with an attack surface).

There is also the potential for developers to be able to implement user-defined annotation support.

There would have to be a broader proposal to integrate annotations in general in to the language.

Thanks!

faiface commented 7 years ago

@bradclawsie I feel like what you're suggesting is that reflection should be removed from the language, since you want to implement reflection stuff through interfaces. Am I correct?

bradclawsie commented 7 years ago

@faiface I am not asking for reflection to be removed from Go. I am proposing removing struct tags (whose contents are opaque and "not Go") and moving their functionality into the language itself.

faiface commented 7 years ago

@bradclawsie Ok, I see that you liked the idea of "tags with syntax". I'm not against that too. Getting back to your original proposal, I still can't see how you'd possibly replace struct tags with interfaces.

bradclawsie commented 7 years ago

@faiface Tags create implicit functionality when serializing or deserializing. There's no reason why you couldn't just require the user to encode these in Go. You would enforce that with an interface. Go has this half-done in certain cases with Marshal and Unmarshal but doesn't formalize the requirement with an interface.

That said, @urandom has a better suggestion...it provides a similar functionality while preserving the conciseness of tags.

faiface commented 7 years ago

@bradclawsie The thing about moving this to interfaces is that it would remove majority of the convenience of reflection. Reflection is so cool in Go, because users don't have to do almost anything to use functionality which uses reflection. All I have to do to encode a struct is to create it (and maybe type a few tags). With this, I'd have to implement a few methods, well, a lot more code, a lot less convenient.

bradclawsie commented 7 years ago

@faiface I agree my suggestion is less convenient, but Go already is focused on explicitness over cleverness everywhere else in the language. A more convenient alternative is @urandom 's annotations suggestion.

faiface commented 7 years ago

@bradclawsie Well, we have reflection in Go. I think it's one of it's greatest strengths. Removing tags altogether would make reflection a lot weaker. However I agree, that making them more integrated to the language makes sense.

ianlancetaylor commented 7 years ago

@urandom 's suggestion is interesting but is incomplete. We must not require the compiler to actually know what @json.OmitEmpty means. That does not scale, and would mean that new packages would not be able to take advantage of this technique. So json.OmitEmpty must be defined in some way in the encoding/json package, so that when that package is imported the annotation is permitted in the importing package. How does that work?

as commented 7 years ago

Also since struct tags effectively create the equivalent of eval in Go code, they also represent an attack surface.

Can we see a practical example of how a struct tag increases attack surface?

bradclawsie commented 7 years ago

@ianlancetaylor ; @urandom may have a better idea, but presumably there would have to be a new mechanism for user defined attributes. Otherwise it could be done with interfaces, but that carries with it the cost of boilerplate.

bradclawsie commented 7 years ago

@as struct tag functionality is invoked automatically (on serialization/deserializing) and is hidden behind an opaque interface (the DSL of the tag text, which can be anything). The Go compiler will make no attempt to protect the developer.

To begin to address the potential for mayhem, you would need to specify the language of tag strings at a minimum.

as commented 7 years ago

Yes, but is there any basis to expect vulnerabilities in struct tags? They are not input fields from clients of the process, only the developer has control over their contents. Am I missing something?

bradclawsie commented 7 years ago

We should expect and defend against malicious code constantly. The expectation is no different than that for regular Go libraries, but the ability to defend is absent: the format for struct tags is ad hoc

as commented 7 years ago

We should consider formalizing the definition of security before changing the language to achieve it. The proposal does not mention tangible security threats. Security is frequently a trade off for efficiency, simplicity, and ease-of-use. I am simply asking how an adversary can exploit struct tags to achieve something considered a security breach by some common criteria.

bradclawsie commented 7 years ago

We're getting off track. I mentioned attack surface as an implication of tags, but the primary thrust of this proposal is to change the Go language so that the functionality embedded inside tags can be made part of the language itself. I would prefer we focus on that.

as commented 7 years ago

It is not my intention to hijack the proposal, however requirements preceeed design. What are the other advantages of removing tags if not security?

bradclawsie commented 7 years ago

The main advantage is to make Go expressive enough to meet the needs of users within the specified language. Tags now provide a "back door" to expressing desired functionality outside of the specified language. In short, Go should not need to embed ad-hoc DSLs.

rasky commented 7 years ago

You seem to have a strong hate for DSLs, but that sounds more like a religious battle, I don't see compelling problems for programmers in your proposal.

An actual problem with struct tags is that they're parsed at runtime, so typos/bugs can go undetected or are harder to diagnose. I think a possible proposal for Go 2 would be to have a way for struct tags to be parsed at compile time. For instance:

Define a fixed format for struct tags. Something like pkg:key=value,key2=value2,key3,key4 (I personally don't feel like a new syntax like @pkg... is necessary, but I wouldn't oppose it).
Have a way for a package like json to define the list of possible supported keys. I'm not sure what the syntax would be, for instance const OmitEmtpy structtag = iota // Documentation here
Parse struct tags at compile time into a map[string][structtag]string where the outer map is indexed by the package name, and the inner map by the tag name. This map can be made available for reflection (just like the string is in Go 1).
During struct tag parsing, generate an error if pkg doesn't map to a real package imported in the file scope.
During struct tag parsing, generate an error if a tag is used but it is not defined in the package.

bradclawsie commented 7 years ago

I don't have a particular problem with DSLs, but in the case of struct tags, they are unspecified. If Go really does need a macro/embedded DSL feature, it should be specified.

cznic commented 7 years ago

I don't have a particular problem with DSLs, but in the case of struct tags, they are unspecified. If Go really does need a macro/embedded DSL feature, it should be specified.

Meaning no one can ever use them for anything out of those "specs". Thanks, but no thanks.

bradclawsie commented 7 years ago

@cznic if you don't think programming languages should have specs, why are you using Go at all? And yes, struct tags effectively hide programming languages

cznic commented 7 years ago

What make's you think a struct tag is a programming language?

bradclawsie commented 7 years ago

The spec says struct tags are just strings, so they can contain ad-hoc instructions to be interpreted at runtime. There are no limits or size constraints. I could inject JS, Lua, or a language I make up myself.

Its one thing to provide this capability in a language...its another thing entirely to offer this as the only expedient way to "get things done" for serialization/deserialization. Go2 should try to fix this.

as commented 7 years ago

There are no limits or size constraints. I could inject JS, Lua, or a language I make up myself.

What does injecting mean? If it refers to typing a long string in a text editor, anyone can do that with their program because they have complete control over its execution as a developer. If you are referring to changing a tag's content at runtime, I don't think this is the appropriate use case for tags.

bradclawsie commented 7 years ago

The things people are encoding in struct tag text are essential functions for the development of everyday Go code, so this functionality should be written in Go, not encoded in some private language.

Let me turn this around...why should something as common and simple as omitting empty JSON tags from serialization have to be done outside of Go as it is specified?

randall77 commented 7 years ago

I understand (and somewhat agree with) the underlying motivation behind this proposal. But it isn't enough to just propose to get rid of struct tags without also proposing a way to replace their use cases.

The language doesn't know anything about JSON. So how do I tell the JSON library that field x of my struct shouldn't be encoded? How would you encode that in an interface, as you suggest? Make the type of x "int32ButNotJSONEncoded", where that type has a special method? That seems pretty unscaleable. And any uses (sets & gets) of that field outside the JSON library need a bunch of casts that they otherwise don't care about.

Or do I write a custom JSON encoding method for every struct that I want to JSON enocde, for any struct for which any field needs special treatment?

I just don't see a better alternative that solves the problems that struct tags solve. If this proposal is to move forward, you'll need to propose something concrete.

bradclawsie commented 7 years ago

@randall77 I agree that the proposal is incomplete and I hope to amend it with more constructive suggestions soon. I wanted to test the waters first to see what kind of reception I would get.

creker commented 7 years ago

Just as an example of how it could be solved, look at C# attributes (I know Java has something similar but I'm not that familiar with it). Attributes allow you to attach annotations to module, class, method, property or variable. After that you can query them at runtime and act accordingly. For example, the following class using XML serializer

public class Car
{
    [XmlElement("StockNumber")]
    public string StockNumber { get; set; }

    [XmlElement("Make")]
    public string Make { get; set; }

    [XmlElement("Model")]
    public string Model { get; set; }
}

will be serialized into this

<Car>
  <StockNumber>1020</StockNumber>
  <Make>Nissan</Make>
  <Model>Sentra</Model>
</Car>

You don't actually have to specify those attributes in C# because serializer will do the same by default. But you can if you want to tweak its behavior. Attributes are used everywhere - serialization, ORM, logging, compiler tweaking etc.

Attributes themselves are regular classes inherited from System.Attribute class. You can define any attribute you want. For example

    public class UserAttribute : Attribute
    {
        public UserAttribute(string requiredParam)
        {
            RequiredParam = requiredParam
        }

        public string RequiredParam { get; }

        public bool OptionalParam { get; set; }
    }

will translate into the following attribute syntax

[UserAttribute("required param value", OptionalParam = true)]

The advantages are clear:

attributes can be attached to pretty much anything
you can define custom attributes using familiar syntax
attributes are an actual language feature. They have a clear specification and checked for correctness at compile-time.
attributes are strictly typed and can store pretty much any information you want. It's just a class with a bunch of

The downside is you have to import modules that define the attribute you want to use.

That's just an example of how the problem is solved in other language. I understand that C# is very different language and don't propose to copy the same feature to Go. Personally, I don't see any advantage to Go tags compared to C# attributes or something along those lines. The fact that you have to import modules was never a problem for me. I didn't even think about that until this very moment. But maybe I'm missing something which would make that a no go.

spiritedsnowcat commented 7 years ago

Attributes are an entirely new level of complexity and should not be compared to struct tags in Go. Struct tags are a convenience measure for marshaling data into wire protocol and file formats. Each field is allowed multiple tags for this purpose, which is why tags are not permitted in the struct declaration or any other data type other than a field. They are not attributes and should not be abused as such. If you are considering attributes as a solution to a problem in Go, then you are going about the problem the wrong way.

creker commented 7 years ago

@garethwarry, Go tags are already used for everything I mentioned apart from compiler tweaking - database schema, serialization, logging. It's not a convenience but an essential feature for many libraries. You can also specify multiple attributes for each entity.

And I'm not proposing anything. This is just an example of solving the same problem in different language. Yes, the problem is the same and saying Go tags are something different downplays it's value for Golang. Tags are, essentially, attributes without any specification and verification. And that's the problem the proposal aims to solve. At least, that's how I see it and very much like to see it solved in Go 2.0, whatever it might be.

bradclawsie commented 7 years ago

@garethwarry There is nothing in the Go spec that limits tags to the common uses you describe...that just happens to be where most people see them.

spiritedsnowcat commented 7 years ago

@creker, It doesn't need verification... a tag is simply a key-value pair of strings. It's not some class with a bunch of enumerations and overloaded methods for validating values. If you need such functionality, you can easily program that into your application in a way that doesn't mask complexity. I don't think it's Go's responsibility to spec something like that, either. It's up to the programmer.

@bradclawsie, You miss the point I was making about complexity.

bradclawsie commented 7 years ago

@garethwarry the spec says tags are strings. There is nothing in the spec that says tags contain key-value pairs:

https//golang.org/ref/spec#Tag

spiritedsnowcat commented 7 years ago

@bradclawsie ...

"By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax."

creker commented 7 years ago

@garethwarry

a tag is simply a key-value pair of strings

Yes and that's exactly the problem. Any typo or simply wrong tag will be detected only at runtime when your code doesn't work. And look again at C# attributes. They resemble the same key-value structure but compiler can actually catch errors. Values can be of any type and that provides safety. Complexity here actually solves a problem.

overloaded methods for validating values

Where did you get that? C# probably allows that (never done or saw that myself) but, again, that was just an example.

By convention

Tags are just happens to be key-value pairs. The spec doesn't limit them to it. Even that is not specified. Not to mention everything else. The spec actually permits any string in tags any string is permitted as a tag

spiritedsnowcat commented 7 years ago

@creker, Do your own error validation. Adding language complexity and reducing flexibility just for the sake of daft programmers needing validation done for them implicitly is a horrible idea. We don't need cookie-cutter solutions from Microsoft for C# programmers who don't know how to error check without try-catches and type validation everywhere. We need a language that allows real programmers to be expressive and get work done. Key-value string pairs is exactly what the languages needs for convenience. Again, if your application requires error validation for tags, then implement it. Otherwise, adding such an unnecessary feature just breaks compatibility and adds complexity. This is a "NO" vote from me.

mdempsky commented 7 years ago

Yes and that's exactly the problem. Any typo or simply wrong tag will be detected only at runtime when your code doesn't work.

If that's the only problem, then we could more easily address it by extending cmd/vet or some other linter to warn when tags don't follow convention.

spiritedsnowcat commented 7 years ago

@mdempsky, golint already does this. It recommends exactly what I quoted above. Any literal string that doesn't follow the key-value pair format is rejected.

bradclawsie commented 7 years ago

Golint is a style recommender for Go1 that has nothing to do with a discussion on changing the Go specification for Go2. You keep bringing up "convention"...the point of this proposal is to change the spec, not merely a recommendation. Go's simplicity starts with the basic fact that you can write a Go compiler by reading the spec.

spiritedsnowcat commented 7 years ago

@bradclawsie, practice is just as important as syntax. When you fail to consider how good practice will come from syntax, then you fail to design a functional language. Look at Duff's device, for example. It compiles, but it's probably the worst use of language I've ever seen from C. Golint recommends good practice from Go, and should not be blown off as off-topic (as you've blown off multiple very good points being made). It recommends how tags are used - which is key-value just as the reflect package also documents.

bradclawsie commented 7 years ago

I use golint and think it has some value, but it has nothing to do with this discussion. This is a Go2 proposal and has nothing to do with third-party tools for Go1.

spiritedsnowcat commented 7 years ago

@bradclawsie,Your bloody topic is about struct tags, and you're saying we shouldn't look at best practices and golang documented uses for struct tags and should instead look at C# as a comparison? Are you high?

golang / go

proposal: spec: remove struct tags #20165