Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
375 stars 19 forks source link

SUBSTREAM, or temporarily resetting the EOF ? #396

Closed rmkaplan closed 1 year ago

rmkaplan commented 3 years ago

One of the complexities of the external format interface is that there is no guaranteed correspondence between the number of characters and the number of bytes. So for the few functions that won't to keep track (PFCOPYBYTES, FILEPOS), the format interface functions have the complexity of needing to communicate how many bytes read.

The stream itself actually knows what happened, but this is all set up so that the caller doesn't have to keep calling GETFILEPTR to keep track.

In the case of FILEPOS, the issue is not actually about the chars-to-bytes, that is already screwed up. It is set up to search a range of bytes, and it is set up with a special optimization so that it can do that calculation again with dealing with large numbers. That makes it even harder to update this to deal properly with character as opposed to byte-sequence searching.

So I wonder whether some of this complexity can be moved down into the stream itself, since streams already know how to keep track of their byte positions in an efficient way.

We have a function GETEOFPTR that tells us the end of file. We don't have a corresponding SETEOFPTR. Suppose we did, and we could RESETSAVE the EOFPTR to the end of the region that we want to operate on, and then have the ENDOFSTREAMOP trigger whenever we went beyond that.

Then PFCOPYBYTES and FILEPOS could just set things up for ordinary binning, with the endofstreamop triggering to say they went passed the range. Those functions would no longer have to do their own arithmetic.

Is there a different way of pushing this down into the stream?

masinter commented 3 years ago

I’d worry about a hardreset. RESETSAVE is ok for user code but not the file system.

BIN is an opcode. Changes need to be coordinated with Maiko.

Charcount as a special just passing in the value seems workable.

rmkaplan commented 3 years ago

We are already screwed on RESETSAVE on a hard reset (stackoverflow in URAID?) because typically we mangle the string's ENDOFSTREAMOP under a RESETSAVE.

The idea would be to change the currently operative stream values that code its eof and that BIN pays attention to, while saving the original true values in other stream fields to ensure restorability. On the assumption that the opcode only looks at those eof-coding fields, it would be transparent.

Byte counting (as distinct from charcounting, that's the heart of the issue) is somewhat difficult to implement, because the inccode implementations have to accurately account for every byte that they read or unread. (I have a simplification ("improvement") to the interface that I still can't get through a load-up--always breaks in reading bitmaps from ADISPLAY)

But the thing that triggered this question is not the cases where byte-counting is currently being used in a fairly transparent way (e.g. COPYCHARS, PFCOPYBYTES), but the fact that there is at least one routine (FILEPOS) that has its own special complexity of segmenting the byte range to avoid creating large numbers while it is counting. That is an obstacle to upgrading FILEPOS to become a character-searcher instead of a byte-sequence searcher.

It's the temptation of such difficult-to-maintain-and-extend optimizations that I would like to eliminate in favor of a general notification scheme when a reader tries to advance beyond a specified range (which I know will require a little adjustment in the multi-byte-character reading implementations, but not in any character reading clients.)