AnssiR66 / AlanStdLib

The Standard Library for ALAN Interactive Fiction Language
Other
5 stars 2 forks source link

Spurious Whitespace in Transcripts #110

Closed tajmone closed 2 years ago

tajmone commented 3 years ago

Every time we run the test suite, we're presented with spurious whitespace differences in the generated game transcripts (an extra space, or the lack thereof, usually at line breaks, but sometimes between different printed statements).

This is a true nagger, for it produces dozens of spurious changed files in Git's working space, making it hard to focus on the real changes of the WIP edits.

The only way to get rid of them is to re-run the test suite from time to time, and commit the spurious whitespace changes; but the problem is that they'll still show up at every run, because sometimes there are added spaces, other times there removed spaces ... so this it's an infinite struggle and an uphill battle.

tajmone commented 3 years ago

The Loooong History of This Problem

Now, this spurious-WS bug has been haunting me for ages, especially with the Alan Italian project. It seems to affect only the StdLib (both EN/IT), and what annoys me the most is that I did eventually manage to get rid of it in the ALAN Italian project, but I just can't remember how I've fixed it! — so, until I can remember, I can't solve this in this StdLib repo.

I had even created dedicated tests for this in the alan-bugs-testbed repository:

The problem was discussed in the Yahoo group, but now that the list archives are gone, the old links to the discussion are useless. My email client doesn't store the email that I had sent to the Yahoo group (now it does, but this was a recent fix), so even if I did post a solution on Yahoo, I won't have a copy of it.

Originally, I though that I had solved it by recompiling ARun using MSYS2, which effectively didn't produce the spurious whitespaces. But I suspect that problem's root wasn't that — or at least, not only that — but rather tied to the CodePage settings of the batch scripts, AND/OR to some stray output statement in the Library itself (something like an empty "$$" string being printed somewhere, which for some reason randomly sabotaged the whitespace in the final output).

As mentioned, I have vague recollections of having tackled the problem from different angles, and ultimately solved it in the ALAN Italian project. I just wish I could remember how! but too much time has passed since the solution, and I just can't pin down the problem.

I'm quite confident though that there was a particular output statement in the library that was contributing to the problem (maybe with named actors, or some other part of the library that tried to suppress a string by using "$$") — I have a strong feeling about this.

I should sift through the ALAN Italian commits history, trying to see if the solution is mentioned in some commit messages.

@thoni56, any chance you might remember how this problem was solved? I remember that we had discussed it multiple times in the course of the past years, because it kept popping up all the time. I'm sure that I must have shared with you the solution at the time.

Ahhhh, my memory can be so bad at times!

tajmone commented 3 years ago

CodePage Fix

I've realized that the main tests batch (RUNTESTS.bat) was set to use the UTF-8 CodePage, instead of ISO-8859-1 (whereas the subfolder individual scripts were correctly using ISO-8859-1). Since the ALAN Italian project uses ISO-8859-1 in all its test scripts, I've amended this from:

:: Set code page to UTF-8 for handling special chars in commands scripts:
CHCP 65001 > nul 2>&1

to

:: Code Page 28591 = ISO 8859-1 Latin 1; Western European (ISO)
CHCP 28591 > nul

Still Not Working

But even after this fix, every time I re-run the test suite I get whitespace differences.

Specific Tests Affected

The interesting thing is that the whitespace problem is showing up mainly in the transcripts of the clothing tests adventure:

I remember that also in the Italian project the problem was showing up mostly with this adventure (which was simply translated to English, in this repo); so, I'm starting to wander if there's something in its code that triggers the error.

Furthermore, there are specific parts of the ega.alan transcripts which are always affected by the problem:

Outside Emporium Alani
You're standing in front of the Giorgio Alani Emporium entrance. Two
large brass doors await northward your entrance into the sanctuary of
fashion consumerism.

where the paragraph end alternates between having a trailing space after "consumerism.", and that space going away, with each run.

The other output always displaying the problem is:

Emporium Alani Main Hall
This luxurious hall is the crossroad to the various clothing departments
of the emporium. Two large brass doors lead the way south, out of the
fashion temple and back into the world of mortal souls. There is a
trashcan here. Behind a desk stands the emporium manager. There is your
personal assistant here.

Where the sentence "There is a trashcan here." switches between having a leading space separating it from the previous sentence, to not having one (e.g. ".There is").

IF Clause in DESCRIPTION Might Be the Culprit!

I've' trace back those sentences to the adventure source, to see if there is something odd about their strings, or if it might be a problem related to the run-time context that handles them; interestingly enough, I've discovered that both sentences are conditionally generated by the DESCRIPTION of the same object, i.e. ega_doors the emporium doors, which are switched from being at the ega_entrance location to being inside the emporium, depending where the Hero is (i.e. the follow him around):

THE ega_doors IsA object AT ega_entrance.
  NAME 'emporium doors'.
  NAME doors. NAME door.
  DESCRIPTION
    IF THIS AT ega_entrance
      THEN "Two large brass doors await northward your entrance
            into the sanctuary of fashion consumerism."
      ELSE "Two large brass doors lead the way south, out of the fashion
            temple and back into the world of mortal souls."
    END IF.

So, @thoni56, I wonder if there's a problem with the fact that these strings are being printed out from within an IF THEN/ELSE clause in the object's DESCRIPTION, because these two stings are always emitting spurious whitespace, with each run of the tests.

thoni56 commented 3 years ago

Thanks for digging into this. There are hundreds of "games", although small, in the regression suite, but they are running with the "regression test" switch to avoid dates, random seeding etc. Initially I thought that might be part of the problem.

But seeing that you have pinned the problem down to a formatting problem, I'll start looking into that at least.

tajmone commented 3 years ago

Thanks a lot. This bug has really haunted me for years, and now it seems back. I was going through the various READMEs of the alan-bugs-testbed repository, and they contain some useful info:

https://github.com/alan-if/alan-bugs-testbed/tree/master/whitespace-bug

What puzzles me is that the MSYS2-compiled ARun terp didn't display this problem, so it might be something to do with CygWin? But I'm quite positive that at some point changing something in the StdLib (Italian) sources had also made this bug dissapear. It might be the result of a mixture of odd combinations, including the specific instructions in which the strings are generated, as well as the CygWin/MSYS2 differences, but I ultimately suspect that something is going wrong with encoding, where a character is being mishandled due to ISO-8859-1 vs Unicode differences — hence, the used C runtime or the CygWin DLL might play a role, depending on whether they use UCS2 or UTF-8 for handling strings under Windows. Unfortunately Windows, especially old versions, have been heavily relying on UCS-2 internally, which is a bad pseudo-Unicode encoding that can lead to unexpected problems.

thoni56 commented 3 years ago

When you write "alternates between" does that mean between runs of exactly the same game with exactly the same input? Or between re-compilations of the game? (Need a way to reproduce one or both of those explicit problems...)

tajmone commented 3 years ago

When you write "alternates between" does that mean between runs of exactly the same game with exactly the same input? Or between re-compilations of the game? (Need a way to reproduce one or both of those explicit problems...)

The test suite script does recompile the whole game at each run (and even create a new IFID, since the old one is deleted by default), and then runs the same game commands on each adventure.

Even if there are no changes in the alan sources or the commands scripts, with each run you get some whitespace differences (especially in those two sentences highlighted above).

I haven't actually tried not recompiling the adventures, but only re-running the commands scripts, but that should be easily done by commenting out a few lines here and there.

Also, I haven't had a chance to test if the problem pops up using Bash for Git — but I've never noticed any spurious whitespace in the StdLib Manual, which also compiles dozens of adventures to generate the transcripts in order to include them in the document.

Right now, only the test suite uses batch scripts, all the other scripts are for Bash, and they don't display white space issues in any of the transcripts. But they don't use the ega.alan adventure either, which seems to be the fulcrum of the problem for some reason.

In the bugs-testbed repository, at one point it turned out that removing trailing spaces from the ALAN sources mitigated the problem, but this is no longer true since all the ALAN sources in this repository are correctly trimmed via EditorConfig now.