crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.32k stars 1.61k forks source link

`Digest` should be an `IO` #14793

Open straight-shoota opened 2 months ago

straight-shoota commented 2 months ago

The Digest class behaves as a stream writer, so it would make sense to have it inherit IO.

This would allow using Digest implementations as data sink in an IO composition. For example, in combination with IO::MultiWriter or IO::DupReader (proposed in #14792), it would be trivial to write / read to an IO, while simultaneously calculating a digest of the data.

For example, the following program prints some data to an IO, and calculates the CRC32 on the fly:

require "digest/crc32"

crc = Digest::CRC32.new
io = IO::MultiWriter.new(STDOUT, crc)

io << "Hello "
io << "Crystal!\n"

p crc.final

The Digest#update(data : Bytes) method is essentially the same as IO#write(slice : Bytes), making this similarity very obvious.

Digest even implements << which can be used in a very similar way as in IO. It's not identical, though. Digest#<< is just an alias for #update, while IO#<< implements string concatenation: It calls data.to_s(self), which is equivalent to write(data.to_s), not write(data)). This would probably be the most difficult problem for integrating Digest with IO, due to the different semantics of #<<. I don't think there are any other incompatibilities.

ysbaddaden commented 2 months ago

Excellent use-case!

straight-shoota commented 1 month ago

I don't think the different implementation of #<< is actually an issue. This method just tells the IO to append an object to itself. How it does that should be up to the IO. The default implementation of IO#<< is oriented towards test-based representations (which are quite common; and #to_s is defined on every type). But Digest is byte-based and thus it uses a different method to append an object.

The only issue is that Digest#<< delegates to #update which only accepts IO, Bytes or a type that implements #to_slice. So it won't compile for a number of other types. I suppose it's best to fall back to the to_s implementation then.

ysbaddaden commented 2 weeks ago

While I like the example, I'm not sure merging the concepts/semantics holds. While an IO writes to a destination, Digest updates a calculated value and eventually needs to finalize it (extra step). I think the semantics are quite different.

I'd like to see other use cases. Otherwise IO::MultiWriter accepting both IO and Digest could be interesting, or a dedicated dispatcher to do just that: calculate the Digest on-the-fly as we write to an IO.

straight-shoota commented 2 weeks ago

I don't see how the semantics are different. You dump data into Digest and it behaves like any other stream interface.

Sure, in order to get a result you need an extra step which is outside the IO interface. But that's just normal setup and teardown. We have that on many other IO implementations. Files need to be opened, sockets need to be connected etc. before you can use them. Compress::Writer needs to be explicitly closed to write the the trailing metadata.