Closed drewnoakes closed 5 months ago
Looks good!
Once these changes are in I'll gather some new traces and see what surfaces.
We're so IO-bound that I think we'll need to rethink how we read from disk/network to get much more improvement on the perf side.
We're so IO-bound that I think we'll need to rethink how we read from disk/network to get much more improvement on the perf side.
As part of that a good goal would be to enable async IO so that we're not blocking threads on IO operations. The way we currently read data, byte-by-byte, doesn't lend itself well to that much async, as the overhead adds up. I lean towards async IO for pulling larger chunks of data from the file, storing those chunks as Memory<byte>
types, and then moving parsing to subsequent steps (non-IO).
Traces gathered over the test suite show:
IndexedReader.GetByte(int)
IndexedReader.GetSByte(int)
Looking at the main callers shows loops in
TiffReader
that call these methods in loops. This approach accrues overhead per-byte due to bounds checking and virtual dispatch.Instead, use the
Span<byte>
overload ofGetBytes
that performs the bounds checking once, then copies data in a single call.It may be possible to give a similar treatment to the handling of other TIFF format codes, though they're not currently showing up on traces and would be more complex to implement due to byte-ordering issues (
byte
andsbyte
being immune from those).