The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
more noted from @rmkaplan on changing encoding of all release Lisp files
But there is an ancient EOL issue that I thought had been resolved many years ago, but seems to still be there in the code.
I did a lot of the character reading stuff in the early days, and apparently I also implemented the notion of an external format in the mid 90’s (which I don’t remember at all). But that’s the interface that makes it easy to add the UTF8 stuff.
The issue has to do with the low-level character reading macros. They all go through a macro \inchar that basically wraps a call to \nsin, that then does xccs/ns stuff inline but otherwise calls out to the external format character function.
But \inchar wraps the \nsin call inside another macro \checkeolc, which triggers on CR and LF, and does coercion to internal EOL (which happens to be CR) when the byte sequence matches the EOLCONVENTION of the file.
If it sees a CR and the EOLCONVENTION is CRLF, it peeks at the next byte and reads it if it sees LF. If the macros are told to decrement the byte count, then the CRLF cause the necessary extra decrement. If it doesn’t see the LF, then it returns CR by itself, which is OK because that’s the internal. But if the convention is CR (which it defaults to now) and the file has CRLF, the LF is left in the file, and that screws things up. And if sees a naked LF, it doesn’t get converted to EOL.
In aboriginal times—before text files moves back and forth between operating system environments with different conventions—there was a lot of fussiness about properly interpreting the EOL. This is still the correct thing to do for output files, so that files will look good in their home environment.
But at one point it became apparent that this is a mistake for input files. Since you don’t know the provenance of a file, if you are operating on a file—or a region of a file—with text or character input functions, then any of the 3 eol indicators that happen to appear in the file should be mapped to the internal EOL.
In fact, that is what PFCOPYBYTES is doing—it calls \NSIN directly instead of \INCHAR because it doesn’t want the accidental EOL convention of the file to get in the way.
I thought I had cleaned this up a long time ago, but apparently not. I’m tempted to take another crack at it: to change the \CHECKEOLC macro to scoop up all the options, and then to recompile the relatively few functions that contain the macros (which may run into the problem in LLREAD).
more noted from @rmkaplan on changing encoding of all release Lisp files
But there is an ancient EOL issue that I thought had been resolved many years ago, but seems to still be there in the code.
I did a lot of the character reading stuff in the early days, and apparently I also implemented the notion of an external format in the mid 90’s (which I don’t remember at all). But that’s the interface that makes it easy to add the UTF8 stuff.
The issue has to do with the low-level character reading macros. They all go through a macro \inchar that basically wraps a call to \nsin, that then does xccs/ns stuff inline but otherwise calls out to the external format character function.
But \inchar wraps the \nsin call inside another macro \checkeolc, which triggers on CR and LF, and does coercion to internal EOL (which happens to be CR) when the byte sequence matches the EOLCONVENTION of the file.
If it sees a CR and the EOLCONVENTION is CRLF, it peeks at the next byte and reads it if it sees LF. If the macros are told to decrement the byte count, then the CRLF cause the necessary extra decrement. If it doesn’t see the LF, then it returns CR by itself, which is OK because that’s the internal. But if the convention is CR (which it defaults to now) and the file has CRLF, the LF is left in the file, and that screws things up. And if sees a naked LF, it doesn’t get converted to EOL.
In aboriginal times—before text files moves back and forth between operating system environments with different conventions—there was a lot of fussiness about properly interpreting the EOL. This is still the correct thing to do for output files, so that files will look good in their home environment.
But at one point it became apparent that this is a mistake for input files. Since you don’t know the provenance of a file, if you are operating on a file—or a region of a file—with text or character input functions, then any of the 3 eol indicators that happen to appear in the file should be mapped to the internal EOL.
In fact, that is what PFCOPYBYTES is doing—it calls \NSIN directly instead of \INCHAR because it doesn’t want the accidental EOL convention of the file to get in the way.
I thought I had cleaned this up a long time ago, but apparently not. I’m tempted to take another crack at it: to change the \CHECKEOLC macro to scoop up all the options, and then to recompile the relatively few functions that contain the macros (which may run into the problem in LLREAD).
What do you think?