golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.13k stars 17.37k forks source link

proposal: io: io.WriterToAt and io.ReaderFromAt #52383

Open Jorropo opened 2 years ago

Jorropo commented 2 years ago

Proposal

WriterToAt

// Start gives you the current head point, length is how much data is remaining.
func (*SectionWriter) Unwrap() (r WriterAt, start int64, length int64)

type WriterToAt interface {
  // WriteToAt SHOULD be capable to unpack *SectionWriter objects to accelerate copies.
  // If a copy from an unwrapped *SectionWriter is performed, *SectionWriter also need to be seeked to reflect that.
  // Off is the offset in the reader. If you want pass an offset in the Writer, wrap it in an *SectionWriter first.
  // If the length missmatch between the reader and the writer copy until the smallest one and return EOF.
  // If the copy could not be performed more efficiently than with a simple read write copy loop, then return with errors.ErrUnsupported
  // If the copy was succesfull err == nil and n == length.
  // If length == -1 that means copy until EOF.
  WriteToAt(w Writer, off int64, length int64) (n int64, err error)
}

ReaderToAt (exactly the same but reader and writer are swapped)

// Start gives you the current head point, length is how much data is remaining.
func (*SectionReader) Unwrap() (r ReaderAt, start int64, length int64)

type ReaderFromAt interface {
  // ReadFromAt SHOULD be capable to unpack *SectionReader objects to accelerate copies
  // If a copy from an unwrapped *SectionReader is performed, *SectionReader also need to be seeked to reflect that.
  // Off is the offset in the reader. If you want pass an offset in the Reader, wrap it in an *SectionReader first.
  // If the length missmatch between the writer and the reader copy until the smallest one and return EOF.
  // If the copy was succesfull err == nil and n == length.
  // If the copy could not be performed more efficiently than with a simple read write copy loop, then return with errors.ErrUnsupported
  // If length == -1 that means copy until EOF.
  ReadFromAt(w Reader, off int64, length int64) (n int64, err error)
}

Motivations

Clean (not having to deal with FDs and CopyFileRange) concurrent zerocopy. On linux, at least, concurrent reads are writes are perfectly safe and sane as long as no writes overlap and no reads overlap with a write.

Easly composable offset based zero copy. For example a database that use a single big file could return *SectionReader to objects pointing to objects on the disks. And even if multiple step later (after being wrapped io.MultiReader call for example), zerocopy would still just work.

Generalisable, other types like bytes.Reader could be optimised that way too.

Current WriterTo and ReaderFrom allows similar operations if you combine them with seeking and LimitedReader but that is not concurrent safe.

The reason to use io.Section* types as wrappers that way is because this avoid an big (8) matrix of required interfaces:

{ReaderFrom,WriterTo}{At,}({Writer,Reader}{At,})

With this only 4 interfaces are required. 2 of them are already in the std already. We replace the type matrix with type assertion in the different implementations.

Related issues:

ianlancetaylor commented 2 years ago

Can we get the same effect by implementing io.SectionReader.WriteTo and io.SectionWriter.ReadFrom?

Jorropo commented 2 years ago

Can we get the same effect by implementing io.SectionReader.WriteTo and io.SectionWriter.ReadFrom?

Assuming the goal is not add any new interface and support concurrent operations (so no Seek + WriteTo tricks). To get zerocopy with that solution to work that would require io.SectionReader.WriteTo to understand that reader and writer are a *os.File and do CopyFileRange calls. Same for bytes.Reader and bytes.Buffer, io.SectionReader.WriteTo would need to able to understand thoses two and copy from the underlying arrays. Same for os.File to net.TCPConn and SendFile calls. What if I create a custom AES-XTS wrapper that expose a ReaderAt / WriterAt managing AES-XTS encryption and writing to a an other device, are we gonna add support for that in io.SectionReader.WriteTo ? (no)

Without any standart API to expose WriterTo + offset + length, we are doomed to multiply adhoc solutions in potential consumers. (or be required to use solutions like importing unix and manually firing CopyFileRange, Splice, SendFile, ... calls from the buisness logic which has so many issues)

If this propsal is accepted, ultimately, *os.File, bytes.Buffer, bytes.Reader, *net.TCPConn, customAesXts.Wrapper would all implement io.WriterToAt and io.SectionReader.WriteTo implementation as easy as:

func (s *SectionReader) WriteTo(w io.Writer) (int64, error) {
  if wta, ok := s.r.(WriterToAt); ok {
    n, err := wta.WriteToAt(w, s.off, s.limit - s.off)
    r.off += n
    return n, err
  }
  return 0, errors.ErrUnsuported
}

From code organisation perspective, this proposal is also better looking to me, because for example, the zero copy code for files would be in *os.File.WriteToAt (where it belong, with the rest of the File code), not in *io.SectionReader.WriteTo.

Jorropo commented 2 years ago

Also, this isn't the question you asked, but still

Why should this be in the std, couldn't this be wrapped in an std alternative io module ?

CAFxX commented 11 months ago

Recently ran into this while attempting to have multiple readers read (via io.Copy) from the same os.File in parallel. Wrapping the os.File in a SectionReader should do the trick, but that currently implies giving up on the WriterTo fast path, and in general on any hope of zero-copy send.