curiousdannii-testing / inform7-imported-bugs

0 stars 0 forks source link

[I7-1723] [Mantis 1759] I6 error on I6 string including y with dieresis #361

Closed curiousdannii-testing closed 2 years ago

curiousdannii-testing commented 2 years ago

Reported by : dfremont

Description :

Although 'ÿ' is explicitly stated by WI 5.10 to be "definitely safe to use", if included in a string in an I6 inclusion it leads to an I6 error (MAX_QTEXT_SIZE exceeded). This is because Inform fails to terminate the string properly: the example below results in the I6 line

[ florp ; print "

with no terminating quote. Oddly, this does not seem to happen if the string is used in an inline definition - only in a full Include. It's possible that 'ÿ' is not supposed to be legal in an I6 string, but if so this should be documented (see #0001758) and there should be a Problem message (or at least an I6 error that doesn't arise just because ni has output bad I6).

Steps to reproduce :

Foo is a room.
Include (- [ florp ; print "ÿ"; ]; -).

Additional information :

imported from: [Mantis 1759] I6 error on I6 string including y with dieresis
  • status: Closed
  • resolution: Resolved
  • resolved: 2022-04-07T05:00:30+10:00
  • imported: 2022/01/10
curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by graham :
Bug fixed. I don't want to say that it's against the rules to use ZSCII characters in inclusions - there are plausible use cases for this, and any kind of I6 inclusion is an experts-only feature by definition. I've put a warning in the documentation.

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
Confirmed. This is not, as I originally guessed, a character encoding issue. (Other Latin-1 characters work when used this way; I7 writes auto.inf in Latin-1 encoding and then I6 compiles it correctly.)

The character ÿ is 255. I suspect that whatever part of the I7 lexer handles I6 inclusions is reading into a signed char variable and then interpreting 255 as -1 (end of file). It should use a signed int variable instead.

However, I would recommend that you limit your I6 inclusions to ASCII and use ASCII escapes instead of literal Unicode characters. Not only will this avoid issue 1758, it means you don't have to distinguish characters 128-255 (which work in I6 inclusions) from higher Unicode characters (which don't).

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
In light of those additional problems, let me upgrade my statement:

I6 inclusions should only use ASCII, and the I7 manual should document that requirement. I7 does too many transformations on the source characters before the I6 compiler gets a hold of them.

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by dfremont :
Not all Latin-1 characters work in this context, though: I pointed out the multiplication sign in #0001758. Another kind of example is 'ð', which is translated into "[unicode 240]" in auto.inf. Behavior like this (mistranslating rather than giving an error) is perhaps reasonable for the higher Unicode characters, since presumably the issue will become clear when the string is printed. But it seems a strange asymmetry that Latin-1 characters work when using I6 by itself but not when using it through I7.

At any rate, I agree using ASCII escapes is probably better style.