What to do with never-assigned variables?

TelosTelos commented 3 years ago

A variable is "never-assigned" if it never appears as the "output" of any instruction that stores a value in an output (like set, op, sensor, ulocate, or uradar).

There are two ways that never-assigned variables can occur in a "responsible" program: (1) special @variables like @unit and @this are pre-defined, and (2) variables of the form "ripple1" or "vault2" are automatically assigned to linked blocks, if present.

I haven't tested this a lot, but it looks like (3) any other variable gets default-assigned to be something that behaves like 0 in numerical ops, though it might make sense if it was actually null. So "irresponsible" programs could assume that this is the case, and, e.g., start using x without explicitly saying x=0 first; and if x is never-assigned, then it will remain zero forever. (Many of our current testing programs are "irresponsible" in this way.)

The question is how should automatic type-detection handle these? This will make a difference, for example, to whether numerical operations involving a never-assigned variable will end up being vectorized. (It may be worth noting that this is a special case of the question of what we should do with variables that get used before being assigned, though we don't currently have any good way of detecting all of those without some sort of flow-tracing. Our "once a class, always that class" assumption makes such ordering issues irrelevant for automatic type-detection, but doesn't tell us what to do with never-assigned variables.)

We should of course allow known @variables and perhaps also unknown ones of the form At('name'). Aside from @unit and @this all current @variables are of type Number (not to be confused with @constants like @copper), but it'd probably be fine to class them as generic Atoms, since auto type detection is great at detecting that Atoms must be Numerical, which is good enough for vectorization purposes.

We should probably also allow ripple1 and vault2, typing them as Blocks, though I could see some case for insisting that they be explicitly declared first. E.g., when the source we're given to compile is a Python function, these could (and perhaps should?) be declared as arguments to that function.

At one extreme, we could just construe every never-assigned variable as a Block. However, I think that would sweep in lots of variable names, like 'x' that couldn't possibly be innate block references, so perhaps we should instead restrict this to things that look like block references (e.g. are alpha with an integer suffix, perhaps requiring that the alpha be a known block type)? This would also change the functionality of many of our old testing programs, by vectorizing numerical operations involving their never-assigned variables.

At the other extreme, we could raise a compiler error for all never-assigned variables that we don't recognize. Since never-assigned variables are often just a misleading way of saying 0, these errors may often help users to catch bugs, like misspellings.

Slightly more moderate, we could print a warning announcing whatever tentative assumption we made. Such warnings could be about as helpful as errors for debugging, but would let workable-enough code still compile.

Another moderate option would be to silently treat unrecognized never-assigned variables as generic Atoms, in which case auto-type detection would often determine that they're generic Numericals (if they participate in any numerical role, which most things do). By default (in my current code) generic Numericals just appear as themselves (effectively "broadcast", NumPy-style) in vectorized operations, and don't force any op to become vectorized. This option would leave our old test-cases working as they originally did, with no need to add explicit x=0 at the beginning.

Lonami commented 3 years ago

Q: What to do with never-assigned variables?

If I remember correctly, every variable starts out as null (so if you don't set any value to x but reference it, it will be null). Kind of like undefined in JavaScript.

A: Nothing. Mindustry will decide how to run the compiled code (treating it as null).

Q: how should automatic type-detection handle these

As discussed before, correctly inferring the type of something at compile time would require flow analysis and even then it's not always possible to do (halting problem). The compiler should probably not care about types, and pretend anything can be used anywhere. This is the current behavior, and it's good enough.

The only more-special types are vectors which my version currently does not implement, but I'm inclined to have a special marker (like .xy) for that.

A: Type tracking probably will not be implemented. This will keep the compiler a lot simpler and work in more scenarios than automatic types would. If type tracking is ever added, probably explicit type hints will be required to treat variables (vectors) differently.

TelosTelos commented 3 years ago

Automatic type tracking is implemented in my current version. As we discussed before, this doesn't actually require flow analysis if you make simplifying assumptions like each variable has the same class always, which makes order/flow and halting problem issues irrelevant.

So, anyway, this is a live issue for me, and it will be for you too if you decide to incorporate the stuff I've done.

I'll re-open this for now, in case you have more to say on it, once you think of this as an actual live issue and not just some far-off hypothetical. For now it sounds like you're leaning towards the final moderate option I suggested: silently treating never-defined variables that don't look like Blocks as generic Atoms, so that code with them will compile even though these will typically end up being oblique references to null/zero. I can go with that for now, though I do think a warning system would help people to catch bugs. (Indeed one of the things that drove me to want a Python->mlog compiler was wasting a lot of time in Mindustry's mlog editor trying to squash a bug that this warning would have immediately highlighted for me!)

I'm still left with the issue of distinguishing what "looks like a Block" from what doesn't. E.g., should I assume that "minigun1" is probably some new mod/patch-added block, that should vectorize like other blocks, or should I instead assume that the coder probably just wanted another oblique way of referring to null/zero?

Lonami / pyndustric

What to do with never-assigned variables? #38