golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.47k stars 17.59k forks source link

encoding/base64: document if (*Encoding).Decode arguments may overlap #56070

Open Merovius opened 2 years ago

Merovius commented 2 years ago

base64.Decoding.Decode accepts two byte-slices. As encoded base64 is never smaller than the decoded data, it would be possible to save allocations by giving the same argument for both - that is, to store the decoded data in the same slice as the original base64 data. If the encoding is invalid, this might of course result in garbage ending up in that buffer. But that is often fine.

The documentation doesn't explicitly state that this is allowed. Empirically, it seems to work, but I'd feel more comfortable doing it, if it was documented. Could we document that it is allowed and what happens if the user does it? If we want to preserve the flexibility of disallowing it, should we document that it is not allowed?

A similar question arises for

robpike commented 2 years ago

I would rather document that they should not overlap, to give more freedom to the implementation.

mvdan commented 2 years ago

@robpike what kind of freedom are you thinking about? If we're talking about room for future performance improvements, the existing code allowing the overlap allows saving an allocation, which is in the same camp. Perhaps you're thinking of something like vector instructions?

Merovius commented 2 years ago

As a potential compromise, cipher.BlockMode.CryptBlocks documents:

Dst and src must overlap entirely or not at all.

This allows to efficiently check if the arguments are overlapping. An implementation could then fall back on a slower path or allocate a temporary buffer, as appropriate. i.e. if we disallow overlapping arguments, the user always has to allocate a temporary buffer, but with this restriction, they could let the implementation choose whether or not it needs that.

cagedmantis commented 2 years ago

cc @rsc

twmb commented 2 years ago

For both encoding and decoding, I've relied on the current implementation many times in somewhat performance sensitive code paths. If this officially is documented as disallowed, can it come with a corresponding vet check to allow checking and fixing affected codebases?

mvdan commented 2 years ago

@twmb that would likely not meet the "frequency" criteria of vet, unless we can show that many Go users are really relying on this undocumented behavior.

twmb commented 2 years ago

For context, I only started using the pattern after I saw it in other code. It's something I've found quite nifty and I've reused it since.

My worry is more that the documentation is changed but the implementation isn't. A few years go by and maybe only then, vectorized instructions are introduced. How much code will this accidentally break then? I doubt people are really re-reading the documentation of base64's Encode / Decode functions, and I doubt that all code out there tests that base64 encoding/decoding is working the way the user expects. Why test something that the standard library is already testing itself, and how often am I looking at code (as a maintainer) that I wrote 2+ years ago?

I am fairly certain that at least one area I used the current code in the past is tested because the result was used in another computation that I then unit tested. I'm not sure if every area I've used the current code is fully tested around base64 changing its behavior. I don't have access to all code bases anymore, and I doubt some areas of former codebases are being revisited because they were (at the time) tested well enough and full stable. I'm worried about upgrading Go silently changing behavior in unexpected areas that aren't tested and likely wont be noticed until it's too late.