golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.79k stars 1.58k forks source link

protoc-gen-go: disable recording the versions in generated file #1185

Open bmhatfield opened 4 years ago

bmhatfield commented 4 years ago

In the v2 version of protoc-gen-go, genGeneratedHeader now outputs a versions section.

We check in our generated protobuf files. This versions output is very problematic for us, as any time there's a minor version bump in protoc (which upgrades automatically via homebrew) all of our generated protobuf files are now different. In order to keep un-necessary changes down, and to avoid diff "flapping", everyone must be on the same minor release of protoc-gen-go and protoc, even if it has no effect on functionality or the actual generated code.

Can we remove that feature or at least offer an option to disable version output?

dsnet commented 4 years ago

This use-case for wanting to disable the version markers is suspect as it's bad practices to have a number of developers working on the same module, but using different versions.

bmhatfield commented 4 years ago

I see where you're coming from, but so far, my experience with protoc minor/patch versions specifically is that they do not affect the output of generated code in any way.

So IMO it's hard to argue that version differences matter there? Perhaps this issue could be reframed as something more like "don't thrash git diffs of generated code when a version change doesn't otherwise affect the code output"?

Given that reframing, it's possible that just hiding sub-versions of protoc might resolve the base problem I'm having.

dsnet commented 4 years ago

Given that reframing, it's possible that just hiding sub-versions of protoc might resolve the base problem I'm having.

By sub-version, do you mean the "PATCH" version per semver terminology? That's something we could consider, but PATCH versions themselves can theoretically introduce changes to the generated .pb.go output and will require more thought.

In general, I recommend that the project enforce some means of ensuring everyone is using the same version of tools for checked in code. This applies to protoc and protoc-gen-go and even go fmt.

ydnar commented 4 years ago

This has bitten us a few times as protoc on macOS developer machines is typically installed via Homebrew, which lags by a few days.

We have dirty tree checks in CI to ensure generated code matches what’s checked into the repo, and tests that verify the behavior of the generated code. If the code changes, CI will catch it. Because of this, the explicit version of the tools used to generate the code is less relevant.

Would you consider adding a flag to disable this?

scudette commented 2 years ago

IMHO it is fine to have a version output in a generated file but it should not update the file if this is the only difference. The code should check if regenerating the file updates it (i.e. updates any of the real lines in the code) and only then change the file.

As a workaround we need to write a hacky python script to clean up the generated files and remove these

puellanivis commented 2 years ago

🤔 We would need to read in the existing file (if it exists) and then do a semantically aware comparison of the file.

Unfortunately, if this is something one truly wants, I think it is a problem that is best solved with a pre-commit check if the only difference is the version like and if so, revert it.

scudette commented 2 years ago

Or it could be an option to not emit those in the first place, which is the most convenient option for some users.

scudette commented 2 years ago

Just for reference to anyone coming to this thread we ended up using sed to clean up the generated files

https://github.com/Velocidex/velociraptor/blob/5e5509145244ebffcef433d957e4500cb4b94a97/make_proto.sh#L64

sed -i -e '1h;2,$H;$!d;g' -re 's|// versions.+// source:|// source:|' $i/*.pb.go
puellanivis commented 2 years ago

I’m not sure the value of “letting protobufs generated by different versions output the same code by omitting version output” fully offsets the value of knowing what version the protobuf was built with.

Adding knobs and dials has effort and cost. Especially, when protoc-gen-go is not being called by us, but by protoc itself.

It is not that I see zero value to this option. But no option or control knob is perfectly free…

scudette commented 2 years ago

When checking the generated files in we reduce the dependency on our users, and we can see when new structs or members in the files are generated due to the git diff anyway. So it becomes very obvious when the file is regenerated using a completely different version of the compiler anyway (because the whole thing would be different).

If protoc would use semantic versioning and would output the major version that would solve both concerns because we know the general layout of the generated code is not changing in incompatible ways without incrementing major versions while minor version differences wont change the file comments.

puellanivis commented 2 years ago

¿I’m confused as to why you would say “if protoc would use semantic versioning”, Because it does? Nothing in the files that is semantically relevant is changing between any of the minor or patch versions.

The comment about minor version differences not changing file comments is additionally weird, because even if we were to flip the case of every single character of nearly all of the comments (all except the autogenerated message), there would be no breaking changes that happen.

scudette commented 2 years ago

My point was that I can see why it would be nice to add a comment to the generated file to document the major version number to ensure compatibility. But indeed if there was a guarantee that compatibility is not affected when major numbers are not changed, then it makes sense to only include the major version number of the generator in the comment.

Doing this will ensure the file does not change unnecessarily while at the same time ensuring that when it does change there is a reason to pay attention to the change (i.e major version number has changed)

Also as you point out the git diff will actually also reveal changes in the actual code generated in case of incompatible changes so it's not really that important to record the version after all.

neild commented 2 years ago

The compatibility promise for generated code is here: https://pkg.go.dev/google.golang.org/protobuf#section-readme

Users should use generated code produced by a version of protoc-gen-go that is identical to the runtime version provided by the protobuf module. This project promises that the runtime remains compatible with code produced by a version of the generator that is no older than 1 year from the version of the runtime used, according to the release dates of the minor version. Generated code is expected to use a runtime version that is at least as new as the generator used to produce it.

Note that this does not apply semver; the promise is expressed in terms of a time window. (Historically, we've substantially overdelivered on that promise.)

The reasons for including the version in generated code:

Note that this second case is essentially the original request here: Everyone working on the codebase is using a different generator version, and you want to avoid being told about that. This is the sort of situation that's okay right up until it isn't; when the generated code isn't changing, the version skew seems unimportant, but things become very confusing when the generated code does change. The best fix here is to set things up so that everyone does use the same generator, for example by having a generation script that installs and runs the generator at a specific version.

That said, I don't personally care strongly about what we do here. Including the version in the generated code makes the repo release process substantially more complicated; perhaps we should step back to keeping a generator version that increments only when generation changes.

scudette commented 2 years ago

These are excellent arguments for recording the version. I can see there is no perfect solution.

It is interesting that you point out that removing the version string is really our confidence that it is irrelevant and we are happy to take the risk of incompatibilities due to different version by different developers. It is good to articulate this position.

In our project we always regenerate all proto files (we have a script that does them all) so there is no chance that some will be out of step with others. In our case therefore this risk is acceptable, especially considering the git diff will usually show the actual code is changing if it matters.

It is easy enough for us to clean up the generated files (using the above mentioned script) so it is not a strong requirement for us just wanted to share our practice on this issue.

Thanks again for considering this issue and providing valuable feedback @neild and @puellanivis .