golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.32k stars 17.7k forks source link

proposal: io: Add io.ReaderFromBuffer interface #58331

Open fredwangwang opened 1 year ago

fredwangwang commented 1 year ago

When calling io.CopyBuffer(writer, reader, buf) , it tries to use the io.ReaderFrom interface on writer if it is available. For example net.TcpConn implemets the io.ReaderFrom and will try to use syscalls like splice or sendfile to avoid additional copy in the user space. This is all well-intended and provides better performance in the case of copying from one net conn to another. However. If the reader is not a real file, but just some struct implements io.Reader, it actually falls back to using io.Copy.

This is problem-some, as the very intention of using io.CopyBuffer is that we get to control the buffer and potentially pool the buffers to avoid GCs. But the generic fallback to io.Copy makes the buffer passed in useless and makes an additional allocation.

This behavior is clearly shown in the following heap profile (left is the buffer passed in, right is the allocation made by genericReadFrom fallback (io.Copy)): Screenshot from 2023-02-04 21-16-11

This is a very surprising behavior to the end developers without the understanding of TcpConn readfrom internals. As an obvious optimization of using io.CopyBuffer in the hot-path turns out to be even worse than just call io.Copy.

So would propose to add an interface similar to io.ReaderFrom, but allows to take additional buffer, thus the name io.ReaderFromBuffer. This way nice optimization regarding fd copies can still happen, but in case none applies, the genericReadFrom can fallback to io.CopyBuffer instead of io.Copy, to take advantage of the buffer passed in, if there is any.

ianlancetaylor commented 1 year ago

See also #16474.

earthboundkid commented 1 year ago

The core issue is that the extended interface is always called if it exists. What one wants is a mechanism like errors.Unwrap or http.ResponseController.Unwrap or fs.ErrNotImplemented to signal that the method doesn't actually implement a direct ReaderFrom/WriterTo and shouldn't be used in a particular case.

fredwangwang commented 1 year ago

Thanks for the context @ianlancetaylor!

just thinking out..

What one wants is a mechanism like errors.Unwrap or http.ResponseController.Unwrap or fs.ErrNotImplemented to signal that the method doesn't actually implement a direct ReaderFrom/WriterTo

@carlmjohnson It does make sense to me. However, agree with the comment https://github.com/golang/go/issues/16474#issuecomment-354825553 here:

... such as io.ErrNotSupported, but it would likely break callers on a massive scale

Considering v2 might not exist ever, to me at least, having something that is non-breaking yet making io.CopyBuffer less surprising has its merit.

Also interestingly mentioned here https://github.com/golang/go/issues/16474#issuecomment-235310031

Conceptually, it seems that ReaderFrom should have a buffer argument and ignore it if it is not needed or nil

this is actually what this proposal about, but in a non API breaking way.


there are two usecase of io.CopyBuffer that I am aware from reading the linked issue:

  1. To optimize. I suspect this is the reason most people use io.CopyBuffer. By adding io.ReaderFromBuffer interface and implements it in things like os.File, io.TcpConn, etc. The kernel fast path can still happen behind scene, and if it does not apply, it will gracefully fallback to io.CopyBuffer instead of io.Copy. The net result is at least as fast as the unintuitive hack, with the exact same memory footprint

    io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)
  2. To shape. I am not positive that anyone actually relies on this. But it is mentioned here https://github.com/golang/go/issues/16474#issuecomment-254361220 in a test case:

    ... another case where the behavior of io.CopyBuffer was unexpected. The test relied upon copy being done in specific chunk sizes

Now there are two scenarios to this,

Bottomline, having io.ReaderFromBuffer still does NOT guarantee the passed buffer will be used (if faster path available), but it will not result in unexpected additional allocations like io.ReaderFromdoes.

earthboundkid commented 1 year ago

Just starting returning io.ErrNotSupported in existing methods would break existing clients, yeah. What you'd want to do is to make a new interface, to avoid bikeshedding call it io.ReaderFromV2 and io.WriterToV2, and then those would be allowed to return io.ErrNotSupported, which io.Copy/io.CopyBuffer would know about and be able to deal with.

ianlancetaylor commented 1 year ago

ErrNotSupported is #41198, which was accepted but I've always feel a bit leery about implementing it.

earthboundkid commented 1 year ago

Yes. I wrote on #41198 that the proposal is flawed because “ErrNotSuported” is domain specific. Not supporting http.FlushError is different than not supporting io.ReaderFrom. An overhaul of io.Copy should have io-specific ErrNotSupported variables.

gopherbot commented 1 year ago

Change https://go.dev/cl/475575 mentions this issue: io: optimize copyBuffer to make use of the user-provided buffer for fallbacks