crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.25k stars 1.61k forks source link

Proposal: `IO::DupReader` #14792

Open straight-shoota opened 2 weeks ago

straight-shoota commented 2 weeks ago

When working with IO streams, it's often helpful to tap into the data stream and observe what's being sent. This can be useful for debugging or auditing purposes, but also for calculating hashes or signatures of the streamed data.

An example for this mechanism is IO::Hexdump: It's an IO which wraps another one and dumps all data that goes through it into another IO (STDERR by default), in hex format.

Actually, IO::Hexdump performs two distinct functions: Tapping into an IO, and hex formatting. Either one of them could be useful without the other, but they're unified in a single type and cannot be used independently.

If there was a variant that performs only the hexdump feature (IO::HexdumpWithoutTap in the following example), the entire write functionality of IO::Hexdump could be implemented using IO::MultiWriter:

io_write = IO::Hexdump.new(sink, STDERR, write: true)

# equivalent:

io_write = IO::MultiWriter.new(sink, IO::HexdumpWithoutTap.new(STDERR))

Of course this is a bit less succinct, but not by much. And I think it's very clear.

The great thing about such composition is that it's useful for other purposes as well. You can easily exchange the hexdump for something else. For example, you could capture the data into an IO::Memory for later replay.

In the other direction, the read functionality cannot be implemented differently, because there is currently no equivalent of IO::MultiWriter for reading.

I'm proposing to add such an IO type. It would have a main source IO to read data from, and it sends all read data to a second IO, in addition to passing it to the caller.

The implementation is pretty trivial:

class IO::DupReader < IO
  def initialize(@source : IO, @sink : IO)
  end

  def read(slice : Bytes) : Int32
    @source.read(slice).tap do
      @sink.write(slice)
    end
  end

  delegate :peek, :close, :closed?, :flush, :tty?, :pos, :pos=, :seek, to: @io
end

This would allow an equivalent of the current integrated IO::Hexdump, which looks very similar to the write variant:

io_read = IO::Hexdump.new(source, STDERR, read: true)

# equivalent:

io_read = IO::DupReader.new(sink, IO::HexdumpWithoutTap.new(STDERR))

Again, it's easy to exchange the hexdump formatter for something else.

Addendum:

The implementation of IO::HexdumpWithoutTap would also be very simple:

class IO::HexdumpWithoutTap < IO
  def initialize(@io : IO = STDERR)
  end

  def write(slice : Bytes) : Nil
    return if slice.empty?

    slice.hexdump(@output)
  end

  delegate :peek, :close, :closed?, :flush, :tty?, :pos, :pos=, :seek, to: @io
end
yxhuvud commented 2 weeks ago

Related function in C (at least on linux, no idea of if it exists on other OSes), : man 2 tee. Could at the very least be a possible optimization in certain cases).

If IO::Tee would be better as a name for it I don't know. DupReader is explicit in what it does, but tbh I'd have to read the documentation to figure out what the purpose of it was. :shrug:

It is fairly simple though in any case, so perhaps it IO#tee could be an alternative (that is, just a method on IO rather than a class of its own) or even IO.tee (similar to IO.pipe).

I really like the idea of this kind of IO data flow control though.

straight-shoota commented 2 weeks ago

Regarding the name, for tee I'd think of man 1 tee which is more similar to IO::MultiWriter. It writes all of the input to all outputs, not just the amount that's read by the main output.