Closed marwan-at-work closed 4 years ago
Related (possibly duplicate?) discussion at https://github.com/golang/go/issues/34472.
This should definitely be in the release notes :) thanks for bringing it up.
It's unfortunate that noone has noticed until now - it's true that rc1 was very late, but it has been over two months since 1.14beta1.
Can you provide a reproducer? I'm particularly interested in what kind of input you're seeing in the wild.
This issue is not a duplicate of #34472, but they are related. json.Number
is documented to only accept JSON numbers when decoding, but in practice it seems to accept any string. #14702 was about invalid numbers (which caused this issue), and #34472 is about accepting quoted numbers even when the ,string
option isn't used.
Thinking outloud, I wonder if we could amend https://golang.org/cl/195045 to keep accepting empty strings, if that's where the vast majority of the breakage comes from. I think not accepting "foobar"
as a json.Number
was a good change.
Also CC @breml since he wrote the original patch.
@erikdubbelboer Provided my use case https://play.golang.org/p/glvZxWT2to8 The problem for me is that the endpoint that generates outputs an empty string or quoted number. And as @marwan-at-work said, this did put down our production. We have a rule that we start testing new release only when they reach rc1
If the only problem is empty strings, would my workaround above be enough?
We have a rule that we start testing new release only when they reach rc1
Sure, though rc1 came out two weeks ago, and the final release is scheduled just a few days away. We can try to have a fix before then, but this late into the cycle, it might have to wait until 1.14.1.
@mvdan Here's how it works in 1.13: https://play.golang.org/p/3Bv8N_6DPST Here's how it fails in 1.14rc1, if you run the same code above you get:
ID: 123
panic: json: invalid number literal, trying to unmarshal "\"c83hdjlkjdf\"" into Number
goroutine 1 [running]:
main.must(...)
repro/main.go:23
main.main()
repro/main.go:17 +0x27a
I think not accepting "foobar" as a json.Number was a good change.
I completely agree. However, I think not breaking people at runtime is a lot more important :)
Furthermore, I think it's pretty easy for people to think that type Number string
can mean something like "a number or a string" and not a "stringified number" -- Case in point, my coworkers did :)
However, I think not breaking people at runtime is a lot more important :)
Isn't the whole point of an encoding that it is a runtime thing?
Case in point, my coworkers did :)
I'll certainly defer to the owners of the package here, but in this case I don't think there's anything wrong with the documentation or the change made for 1.14. The underlying type of Number
is, in effect, an implementation detail, and the documentation is quite clear:
A Number represents a JSON number literal.
Depending on the exact use case, the fix might look something like https://play.golang.org/p/NuDKRzkKuZ9
@marwan-at-work - is there a reason you think the fix here needs to be on the Go side?
CC @cagedmantis @dmitshur @toothrot to decide what (if anything) to do about this for 1.14 proper.
(Personally I'd be inclined to just clarify the release notes.)
As much as I like the new behavior, I imagine this might cause some production apps to go down in the wild.
It's unfortunate, but that's why it's important to integration-test (and fuzz-test) production services. (A change to conform to the documented behavior for clearly-invalid inputs certainly does not violate the Go 1 compatibility policy.)
I agree with @bcmills - the bare minimum here is to clarify this in the release notes.
There is a tradeoff here - technically following the compatibility policy, but breaking half the Go programs out there, would clearly be a bad idea. However, I don't think that's the case here, and like @myitcv showed, there are workarounds. Your code isn't guaranteed to work exactly the same between Go versions, especially if you don't stick to documented behavior.
Isn't the whole point of an encoding that it is a runtime thing?
@myitcv my point was that a runtime breaking change is more dangerous than a compile time breaking change. Because users might upgrade to 1.14 and not realize their application broke until after it's serving traffic to users in production :)
While it varies by the severity of the issue, Go users have gotten pretty used to upgrading minor versions and trusting that their application will not break.
That said, all of the above sounds good to me. I was just surprised nothing was mentioned in the release notes so I just wanted to bring it up.
Thanks everyone 🙌
Thanks all for the input - I'll have a CL ready for the changelog doc later today.
CC @cagedmantis @dmitshur @toothrot to decide what (if anything) to do about this for 1.14 proper.
(Personally I'd be inclined to just clarify the release notes.)
Thanks all for the input - I'll have a CL ready for the changelog doc later today.
Updating the release notes sounds reasonable to me for this kind of change. We'll make sure that CL ends up on the release branch in our next update to it.
I agree with @mvdan/@myitcv and I think updating the release notes is the right thing to do.
Change https://golang.org/cl/220418 mentions this issue: doc/go1.14: document the change to json.Number decoding
json.Number
.The problem is that this fix will break production systems, as the world is full of badly formed JSON, and production systems have come to rely on the old bug to handle this data. This does not break the Go 1.0 compatibility promise. But it will cause some users real pain. The end result is that upgrading to 1.14 will break production systems, and developers will become more hesitant to upgrade to new versions of Go, without extensive testing of their systems.
@amnonbc - at this stage we have, by my count, two reports relating to this change. Whether this will affect production systems is a function of how well systems are (integration) tested (https://github.com/golang/go/issues/37308#issuecomment-589094085). I don't think there's strong evidence at this stage to warrant a change to the proposed course of action of updating the release notes.
Two reports is quite a lot at this stage. I image we will get many more once 1.14 is out. But it is really a policy question. Are we OK with breaking our users production - if we can point the finger at them, and say that they should have sorted out their integration testing, and that they should never consume services which produce json with schema violations (which, in the real world, is a large proportion)?
@amnonbc please read the previous discussion - otherwise we're going in circles.
Hello @mvdan and @myitcv . Good to see GLUG people here.
Yes I have read the discussion. There are the language purists for whom the spec and documentation is the sole source of truth. And there are the unwashed masses who have the task of keeping production installations running in the real world, and integrating with multiple external services that they do not control, many of which produce what @bcmills described 'clearly-invalid inputs'.
I am just a humble teddy bear, so I defer to the judgement of the maintainers here. There is no right answer. But we should be most grateful @marwan-at-work for testing the RC1 and drawing our attention to this problem at this early stage.
Yes, this isn't a black and white situation. The reason this is an acceptable change is because, like we said before:
I should also add that this isn't an early stage to discuss this kind of decision. It would have been far better to have this discussion when rc1 or beta1 came out :)
The broken programs were depending on undocumented behavior
So serves our users right! But in seriousness, any undocumented behaviours will be relied upon by a section of our users, and we should be wary of silently breaking their code.
There is a possible workaround with just a bit of extra code; this is not a regression nor a bug that blocks development
In order to insert the workaround, you need to know that the problem exists.
Noone is forced to upgrade to 1.14 immediately; 1.13.x remains supported
But we do want people to upgrade. And we want to make this upgrade as frictionless as possible, eliminating any unnecessary FUD.
Two reports in the two months since 1.14beta1 released isn't proof of a large amount of programs in the wild breaking (also note that Google has been using Go internally since rc1, and noone has raised a flag there)
Google live in an walled garden, have a higher level of engineering skill than most, and depend on very few services that they do not control.
I should also add that this isn't an early stage to discuss this kind of decision. It would have been far better to have this discussion when rc1 or beta1 came out :)
True. But we are where we are...
The point of the Google example is that no internal Google tests broke with this change, including tests that used third libraries written outside of Google. That suggests that there isn't all that much code that depends on this specific behavior.
I haven't really looked into this issue. But when considering a change in undocumented behavior, it is absolutely relevant how many failure cases there are. Backward compatibility is very important, but we also have to be able to fix bugs.
But in seriousness, any undocumented behaviours will be relied upon by a section of our users, and we should be wary of silently breaking their code.
Yes, and this lengthy discussion is proof of that thoughtfulness.
In order to insert the workaround, you need to know that the problem exists.
This is precisely what the CL above is about - adding it to the changelog.
Change https://golang.org/cl/220367 mentions this issue: [release-branch.go1.14] doc/go1.14: document the change to json.Number decoding
I just upgraded one of our internal repositories to Go 1.14rc1 and our tests failed.
The reason the tests failed is due to this issue: https://github.com/golang/go/issues/14702
I had to go in and update our code base to make Go 1.14 work, so it's technically a breaking change.
More dangerously, this is a runtime breaking change and not a compile time one.
so I can only imagine the number of Go apps out there running in production that would break because of this.
As much as I like the new behavior, I imagine this might cause some production apps to go down in the wild.
On one hand, the repository was incorrectly using json.Number and treating it as something that could be a number, but would also allow strings. In fact, people probably assume that's what json.Number is since it is of type string.
I'm opening a new issue instead of commenting on the old one because I'd like to bring light into this breaking behavior and make sure we have a decision before Go 1.14 is officially released.
cc: @mvdan @rsc
Thanks!
$ go version Go 1.14rc1, darwin