Variable Validation (for SugarCube)

It would be nice to let users who want extra validity checking to specify valid variables and their types.
This issue is more of me spilling out ideas, since this is a bit complex and would require adding a good bit of code to handle this properly.

Valid Variables

Simply specifying the name of valid variables was in part the goal of #61. In the simple case of just saying what variable names are valid, it is less complex but still runs into issues.

Parsing all uses of variables, just in the Twee files, is probably not possible.

Supporting this for normal macros which use the argument parsing isn't too hard. The arguments parsing provides the information of whether it is a variable or not pretty easily, and that can be checked even in cases where we are unsure about their parameter types. We would simply check if it was in the valid variable list, and warn if it wasn't. Setup/Settings offers some challenges here, but not too rough.
The main issue comes from more complex macros. This being the ones that are really useful to check, that being macros like <<set>> and <<if>>. There's a few others as well.
The issue with these is that they can essentially be arbitrary TwineScript (JavaScript with some modifications like to being equivalent to =).
Yet, I think it is still feasible to provide useful information for those, even if we do not want to embed a full on Javascript parser (though that is feasible, since these aren't that complex of codes in the blocks, and so aren't awful to parse, but I would want some more aggressive caching. It would also be complex to use.). The common cases, of course, are <<set $name to 42>>, <<set $name to "text">>, <<set $name to $thing>>, <<set $name to $thing and $thing2>>, etc. We could probably get a useful portion of the way there just with handling cases like these. Nontrivial, yes, but we are primarily just trying to extract variable names ($thing, _thing, setup.Thing, settings.Thing) from it, and that is easier than general parsing.
Infact, it would probably be possible to capture a lot of it with a regex. It would probably be more open to false positives, but it might be good enough, without being super complex.

Types

Having a list of types in variables is useful, especially in avoiding typos, but it could be made better.
Obviously, with types.
Currently, parameter parsing has the big hole in it that it can't verify the types of variables. <<textbox $name>> can't tell whether $name is a variable holding a receiver, or perhaps the user intended to write "$name". This is an issue anywhere. <<goto $thing>> does not have any sense of whether $thing is remotely applicable. Perhaps the user meant to use <<goto $thing_passage>>, because they're related?
Basic type declarations could help this a lot. Ex:

variables:
    $name: receiver
    $thing: string
    $thing_passage: passage

So the types are simply declared. There is some ambiguity, however. string could be a valid passage name, but also not. I'd rule on you having to specify the more concrete passage type in order to use it. However, this also likely needs more and different types than the parameter parsing receives. unknown should be a type here (probably not any since that might become special in parameter parsing), which acts like variables do now where the type could be anything.
This would likely have the issue of wanting to assure T3LT that the variable in that position is right, even if the types disagree. That would require a comment on it.
It might also be beneficial to allow declaration of it holding multiple types, but I'm unsure of the best way to handle that. It would essentially have to be saying that it could be used anywhere that requires one of those types. Because we aren't Typescript, so we can't simply add syntax that narrows the type down, and comments are probably pretty bad for that.

Extra (Per Passage Validation)

We might also want per-passage validation. The obvious use-case is for temporary variables. Declaring them as valid everywhere would be.. unfortunate. Being able to declare where they can be used would be better. Not as good as actually noticing that the user calls <<set _tmp to random(0, 100)>> and only allowing it to be used after that line in the passage, but better than nothing.
Especially since the user might reuse temporary variables with different types (aka _tmp).
This would also be extended to normal variables. It would probably be less useful than the tmp variables, but would be more useful than the normal variables.

Usability

We want this to be nice for the user. Writing a passage and having to swap back to your config file to specify the types is a pain. There would of course be a Quick-Fix for adding unknown variables to it quickly, which would help.
The problem, however, is that they would be unknown. This might be fine for writing it out, but the point of types is to encourage the users that enable it to get good error-checking.
There's four non-exclusive options that can be taken (I think all of them would be good, but one is better than none):

Separate quick-fix option that lets you specify the type in a vscode dialog (like the dropdown-from-the-top style dialog).
Basic inference. It could scan the current passage for usages and acts of setting the variable. If it sees <<set $name to "blah">> then it can likely infer that it is a string. This would be more a collection of rules to infer things, which could be expanded. It would probably be beneficial to also have more error-prone ones toggleable in settings, since fixing the type isn't that hard.
A setting to warn on any usage of a variable with an unknown type. This might be somewhat noisy without the other features.
A setting to warn in the config file of an unknown type. This keeps that warning alive, but more out of the way. The issue with this is that I don't think we currently get a span from the yaml/json parsing. So we'd either have to get a better parsing library for that, or just put them at the top of the file.

In Passage Validation

This is either an extension of the passage specific validation, or an alternative.
It would probably be nicer, at least for temporary variables, to specify the type in the passage at which they're declared. Unless you're using the same temp variable name across multiple passages with the same type (which does happen, and should be done if you're not wanting that data to be saved!), the temp is pretty much for the single passage.
The idea is to essentially have a special comment that declares the type of the temp variable.

/* t3lt-var-type: _rng: number */
<<set _rng = random(0, 100)>>

This would scope that to this single passage.
As an extra potential that makes this an attractive option to implement is that it would allow erroring if the variable was used before/after a certain point, since we're told where it is created.

/* an error */
<<if _rng == 0>>

<</if>>
/* t3lt-var-type: _rng: number */
<<set _rng = random(0, 100)>>

Theoretically this also allows checking actual scopes and flow based on that:

<<if $some_condition>>
  /* t3lt-var-type: _thing: number */
  <<set _thing = 5>>
<</if>>
/* an error, because it isn't defined in all branches */
You have _thing things.

However, that is notably more complex, and so it out of scope for this document.

Usability (TypeScript)

Some users use Typescript with SugarCube. If so, they likely have something like this (at least for variables they want to access in code):

declare module "twine-sugarcube" {
    export interface SugarCubeStoryVariables {
        name: string;
        lname: string;
        age: number;
        hair_color: HairColor;
    }
}

It is unfortunate to force the user to redeclare the same variables in another file. This isn't major but it does increase friction.
Sadly, I don't think there's a good way for our extension to ask the inbuilt Typescript extension about what types it has modified. If there is, it becomes a possibility to actually automatically use those definitions.

cyrusfirheir / twee3-language-tools