StreamCharacterSequence optimize rewinds to current buffer

Whiteknight commented 1 year ago

Rewinds to the current buffer cannot currently be optimized. We would need to keep track of three pieces of information, which together do not fit into the SequenceCheckpoint:

The BufferStartPosition where the stream was at the beginning of the current buffer
The character Index in the current buffer
The stream Position which corresponds to this index.

Whiteknight commented 1 year ago

We do have the .Consumed property in the checkpoint. We could maintain a table of Consumed->Position for the start of each buffer, and then have something like a skip list to look up what the BufferStartPosition of the buffer which contains the given .Consumed value (we would have to modulo back to the .Consumed value at the beginning of the buffer, or find the largest .Consumed value in the skip list without going over).

If stream is large, and buffer size is small, this lookup table might grow very large. But, we have the benefit of being able to pre-allocate for most cases because we know the size of the stream.

Whiteknight commented 1 year ago

This is done. I added a lookup with buffer start positions to the sequence, so I can look up information about the buffer start from the information that was already included in the sequence checkpoint. This allows fast rewinds to the current buffer, though behavior can get a little bit pathological if we are doing frequent read/rewind operations around the boundary of a buffer.

Whiteknight / ParserObjects

StreamCharacterSequence optimize rewinds to current buffer #191