curiousdannii-testing / inform7-imported-bugs

0 stars 0 forks source link

[I7-1932] [Mantis 1968] Unicode strings are not compared properly #152

Open curiousdannii-testing opened 2 years ago

curiousdannii-testing commented 2 years ago

Reported by : halkun

Description :

Then doing a "if X is Y" statement on two identical Unicode test stings, the test will fail.

Steps to reproduce :

"Test" by Halkun

A Test Chamber is a room

The first character is some text that varies.
The second character is some text that varies.

When Play Begins:
    Now  the first character is "ã‚’";
    Now the second character is "ã‚’";
    If the first character is the second character:
        say "The characters match.[line break]";
    If the first character is not the second character:
        say "The characters do not match.[line break]";

Additional information :

This effects "If X is Y"
This does not effect "if X exactly matches Y"

imported from: [Mantis 1968] Unicode strings are not compared properly
  • status: Reported
  • resolution: Open
  • resolved: 2022-04-07T05:01:59+10:00
  • imported: 2022/01/10
curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
Confirmed. The problem seems to happen for any comparison of strings that contain characters beyond Latin-1 (U+0100 and up).

Problem still occurs if you use literal strings rather than global variables. E.g., this does not print anything:


if "Ä“" is "Ä“":
say "Should say yes.";

("Ä“" is U+0113.)

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
After diving into the code for a bit:

Inside TEXT_TY_Compare_Inner(), we are comparing two string blocks with subtype Routine. That is, (txt-->1 ofclass Routine). These are the routines generated from "[unicode 12434]" or whatever the character is.

TEXT_TY_Compare_Inner() has the logic "if both strings have subtype Routine, they are identical if the routines have the same address." But this fails, because (in this case) R_TX_S_134 and R_TX_S_135 are different routines which produce the same output.

So you can demonstrate the same bug without Unicode:


Now ch1 is "x[1]y";
Now ch2 is "x[1]y";

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
I think it suffices to make this a conservative check. That is,


if ((left_txt->1 ofclass Routine) && (right_txt>1 ofclass Routine) && (left_txt>1 == right_txt->1)) return 0;

But if the routines have different addresses, continue with the TEXT_TY_Temporarily_Transmute strategy.

I'm not positive this is the best solution, or even a correct solution! I just tried a couple of cases and it seems to work.

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
Yes, good points. Routines always have to be transmuted, I guess.

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by zarf :
Yep, I entirely forgot.

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by mattweiner :
I can't check your proposed solution, but are the following edge cases for conservative checks?

Lab is a room.

Str1 is text that varies. Str1 is "[randomtext]". Str2 is text that varies. Str2 is "[randomtext]".

To say random text: say "[one of]first alternative[or]second alternative[at random]."

When play begins: if str1 is str2, say "[str1] matches [str2]."

Or for some slightly more defensible coding:

Lab is a room.

To say desctext, say "You see something special about [the item described]."

A rock is in Lab. The description of the rock is "[desctext]".
A stone is in Lab. The description of the stone is "[desctext]".

When play begins:
[tab]if the description of the rock is the description of the stone:
[tab][tab]say "The rock and the stone have the same description, as you can see:[line break]";
[tab][tab]try examining the rock;
[tab][tab]try examining the stone.

I'm not even sure what the desired effect should be here, but it seems like something that might be worth documenting (if it is indeed an edge case).

(EDIT: Apologies, I don't know how to make code show up properly in comments.)

curiousdannii-testing commented 2 years ago

557058:4c095ffd-6d6f-47ce-9e73-77c613347b86:

Comment by dfremont :
For better cross-referencing, let me explicitly point out that the underlying problem here is issue #0001865 (which Zarf may have forgotten about, since he submitted it in February).