DavidKinder / Inform6

The latest version of the Inform 6 compiler, used for generating interactive fiction games.
http://inform-fiction.org/
Other
201 stars 33 forks source link

Recursive abbreviations #42

Open hlabrand opened 3 years ago

hlabrand commented 3 years ago

The following code is legal:

Abbreviate "@00 Smith";

[ Initialise ;
    string 0 "John";
];

(The command "string n "str"" is named "printing variable" on page 30 of the DM4; it is implemented as an abbreviation. The Z-Machine has 96 abbreviations : 64 can be defined by Abbreviate, 32 with printing variables.)

As mentioned here, this however violates paragraph 3.3.1 of the Z-machine Standards Document 1.1, "Abbreviation string-printing follows all the rules of this section except that an abbreviation string must not itself use abbreviations". Virtually no interpreter (Infocom's, 8-bit ones, Frotz, etc.) ever complain about this, except Bocfel (so, Spatterlight on OS X and Gargoyle on Linux), which crashes.

The compiler should give a warning, or an error, for this case. Similarly, @00 should not be included in "string 1", etc. (But I don't know if this is even allowed by the compiler; if it is, it could definitely create infinite loops, or at least nasty recursive patterns.)

erkyrath commented 3 years ago

I'm always unsure about cases where you explicitly ask the compiler to generate invalid code. It's not clear that it's the compiler's job to stop you, especially in cases where the specs have changed over time.

Has anyone tested Infocom's interpreters for this case? Or searched the Infocom source for cases where it happens?

hlabrand commented 3 years ago

In that case, I didn't ask it specifically to generate invalid code - since the "printable strings" replace chunks of text with unreadable "@02", I had a python script do the substitution in the text for me. (I needed to use all 96 abbreviations or my game wouldn't fit in z3.)

That Python script also replaced the strings inside the abbreviations, which I didn't notice. Every interpreter I've tried, including Infocom's (for the Atari ST and the Apple II at least) had no complaints (which meant they allowed recursive abbreviations, or at least one level of them). Except Gargoyle, and later Spatterlight, both based on Bocfel, which crashed with "fatal error: recursive abbreviation".

I believe the Inform compiler already prevents this kind of things from happening when you declare abbreviations that overlap (i.e. in Abbreviate "forest" "rest" it won't code it internally as "fo + abbreviation number 2"). But since there are two kinds of abbreviations in I6 (Abbreviate and string 0 "qwerty"), that check isn't done across these. I don't know if it's the compiler's job, but I figured a warning if "@01" is in one of the Abbreviate strings was possible. (But it's true that nobody had ever thought of doing that before my Python script, so it's really a corner case...)In any case, Bocfel thought they needed to throw a fatal error for violating the standard, but Frotz doesn't.

And I guess the broader question is whether the I6 compiler can generate code that violates the Z-Machine standard (i.e. is that document for interpreters or also for the compiler?). I was under the impression it needed to be corrected if we want to adhere to the standard, but it's true that these specs do change, as you mention it. So I'm not sure anymore whether it needs to be fixed or not :)

hlabrand commented 3 years ago

Just to substantiate the infinite recursion claim, Chris Spiegel at the garglk project noticed that the following code is totally possible:

[Main;
  string 0 "@01";
  string 1 "@00";

  print "@00";
];

Frotz and Zoom crash with a segfault, Viola has a recursion error, Bocfel, Fizmo and Nitfol throw a fatal error for violating the standard. But that definitely falls under the "asking the compiler to generate invalid code"; it just shows why the rule of disallowing recursive abbreviations is a good one.

erkyrath commented 3 years ago

You can write an infinitely recursive function in Inform too. The compiler makes no attempt to save your ass. :)

And I guess the broader question is whether the I6 compiler can generate code that violates the Z-Machine standard

My feeling is that we have many versions of the standard, starting with "some interpreter allows it". The compiler needs to be able to support all of them, and create experimental game files for extended interpreters and possible future versions of the standard. It should also, for example, be able to rebuild TXD-dumped Infocom games with all bugs replicated. This means erring on the side of "let the user generate that if they really want."

Of course the field is pretty static these days (I don't know if we'll ever get a Z-Spec 1.2) but the principle still holds.

Now, this principle is most important for assembly opcodes. You really want assembly to be able to generate any opcode whether it's legal or not. For high-level statements and expression code, we want to be careful and only generate valid code.

This abbreviation question is in the grey area. I can see adding a warning (not an error) for abbreviations that contain abbreviations. But, again, I'd like to see an assessment of how Infocom handled this case and whether it exists in any known game file.

hlabrand commented 3 years ago

For sure! Thanks :)