DavidKinder / Inform6

The latest version of the Inform 6 compiler, used for generating interactive fiction games.
http://inform-fiction.org/
Other
204 stars 34 forks source link

put_token_back() does not revert symbols #233

Closed erkyrath closed 1 year ago

erkyrath commented 1 year ago

There's a comment in lexer.c:

    /*  The lexical context is a number representing all of the context
        information in the lexical analyser: the same input text will
        always translate to the same output tokens whenever the context
        is the same.

        In fact, for efficiency reasons this number omits the bit of
        information held in the variable "dont_enter_into_symbol_table".
        Inform never needs to backtrack through tokens parsed in that
        way (thankfully, as it would be expensive indeed to check
        the tokens).                                                         */

The last paragraph is just wrong; we do backtrack through symbols parsed with dont_enter_into_symbol_table set. This causes quite a few problems.

I should say, it's the reverse that causes problems: sometimes we backtrack through symbols parsed without that flag, and then re-lex the token with the flag. This leaves a spurious symbol in the symbol table.

Known bugs resulting from this:

I think we're going to have to suck it up and fix put_token_back() to take a symbol out of the symbol table when we back up across it.

I envision adding a DELETED_SFLAG flag for symbols. When we back out a symbol we set that flag and remove the symbol from its hash chain. When looking up a symbol value, if we see DELETED_SFLAG, that's an error. (We shouldn't be saving a symbol reference for a token we're putting back! If we do, we should fix that as a bug.)

This will (still) waste a bit of symbol table space, but we'll get consistent results. And we can fix lexical_context() to save the dont_enter_into_symbol_table flag.

erkyrath commented 1 year ago

Come to think of it, #undef already does the remove-from-hash chain thing, but it has no protection against referring to a removed symbol. (It's fine to have a ref to that symbol before the #undef.) So I could just rely on that code. But I'd rather have the flag and the error check.

...Now that I look, #undef leaves the symbol in the hash chain if it wasn't defined before! That's a mistake, although not a serious one. (The symbol still has UNKNOWN so you can't use it.)

This is a good use case for dont_enter_into_symbol_table!