Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
376 stars 19 forks source link

CL:READ-FROM-STRING returned position is incorrect #1812

Closed MattHeffron closed 2 months ago

MattHeffron commented 3 months ago

Describe the bug The position value returned from CL:READ-FROM-STRING is twice the value it should be. (It is returning the byte position, not the character position.)

To Reproduce Steps to reproduce the behavior:

  1. Run full.sysout
  2. In the XCL Exec, enter: (multiple-value-list (read-from-string "ABCDEF X"))
  3. The value (ABCDEF 14) is returned and displayed

Expected behavior In step 3, the value returned should be (ABCDEF 7). Note that the Common Lisp HyperSpec (CLHS) page for read-from-string notes:

position---an integer greater than or equal to zero, and less than or equal to one more than the length of the string.

The 14 is not within that range. Also from the CLHS:

The secondary value, position, is the index of the first character in the bounded string that was not read.

Context (please complete the following information):

Other info The second returned value from read-from-string appears to be used in SEDIT-COMMANDS** in extract-current-selection. I don't know why this isn't affected by this.

rmkaplan commented 3 months ago

I think there is a simple, kludgy, brute-force fix to this.

OPENSTRINGSTREAM converts a thin-string (as in your example) to fat, so that every character occupies 2 bytes. So replacing (\GETFILEPTR..) by (FOLDLO (\GETFILEPTR..) 2) should give the specified value.

If commonlisp had said that any of the file-reading functions should also return the character position of the first unread character, that would be much worse (utf-8, xccs…).

Separately, the spec for CL:READ-FROM-STRING is a little strange, saying "If the entire string was read, the position returned is either the length of the string or one greater than the length of the string.” Which is it?

On Aug 24, 2024, at 10:32 PM, Matt Heffron @.***> wrote:

Describe the bug The position value returned from CL:READ-FROM-STRING is twice the value it should be. (It is returning the byte position, not the character position.)

To Reproduce Steps to reproduce the behavior:

Run full.sysout In the XCL Exec, enter: (multiple-value-list (read-from-string "ABCDEF X")) The value (ABCDEF 14) is returned and displayed Expected behavior In step 3, the value returned should be (ABCDEF 7). Note that the Common Lisp HyperSpec (CLHS) https://www.lispworks.com/documentation/HyperSpec/Body/f_rd_fro.htm page for read-from-string notes:

position---an integer greater than or equal to zero, and less than or equal to one more than the length of the string.

The 14 is not within that range. Also from the CLHS:

The secondary value, position, is the index of the first character in the bounded string that was not read.

Context (please complete the following information):

IL:MAKESYSDATE: 31-Jul-2024 02:24:38 **Other info The second returned value from read-from-string appears to be used in SEDIT-COMMANDS in extract-current-selection. I don't know why this isn't affected by this.

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/1812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJJ5JEBULLJ7GQQGRWDZTFUALAVCNFSM6AAAAABNCHFA7OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DKMBWHA2TKNI. You are receiving this because you are subscribed to this thread.

tfeb commented 3 months ago

On 25 Aug 2024, at 17:40, rmkaplan @.***> wrote:

Separately, the spec for CL:READ-FROM-STRING is a little strange, saying "If the entire string was read, the position returned is either the length of the string or one greater than the length of the string.” Which is it?

I think the spec talks about this: there can (it claims) be cases where you (might) want to simulate an extra character at the end of the string and the thing is then allowed to return the index it would if that character was there.

That strikes me as a ludicrously poor argument: even for an implementation that does this (I presume there was one) then instead of making the implementation make at worst a call to MIN with one of the arguments being the string length, every program has to be careful. That's C-level design.

--tim

MattHeffron commented 2 months ago

The second returned value from read-from-string appears to be used in SEDIT-COMMANDS in extract-current-selection. I don't know why this isn't affected by this.

SEdit didn't have an issue with this because read-from-string was using the start value with the same interpretation as byte position, not character position. This is corrected in commit 07e858d of PR #1833.