MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
160 stars 30 forks source link

Inconsistent global-variable behavior from `x = 1` versus `x = x + 1` #430

Closed bhaller closed 6 months ago

bhaller commented 6 months ago

@petrelharp Please read this:

A user discovered that there is an inconsistency in Eidos in the way global variables are handled. If you define a global x with defineGlobal("x", 0); then the following statement (inside a non-global scope, such as a SLiM callback or a user-defined function):

x = 1;

defines a new local variable, also named x, that shadows the global x and has a value of 1. This is by design, and mirrors the behavior of R, but is also kind of confusing if you forgot that the original x was a global, and expected the assignment to replace the value of that global. (Or if you just didn't understand the way the scoping rules work!)

But if, on the other hand, you do:

x = x + 1;

then the global x gets assigned a value of 1; a new local x does not get created. This is a bug, due to an internal optimization of assignments of that form (a = a + 1 and similar) that actually finds and modifies the global x when it should not do so. Ironically, however, this behavior is often less confusing than the behavior of x = 1; above; often (but not always), the user actually intends to replace the value of the global.

I think the behavior of x = 1; is necessary, but it is unfortunate. It is necessary because if you write a user-defined function that uses a local variable x, you don't want your function to accidentally replace some global named x; you want your function's internal variables to be scoped. Scoping is good. Except when it isn't – except when you actually intend for your user-defined function to replace the global x. I don't think there's any way to read the user's mind, and so we have to follow R's behavior here – and so x = x + 1 needs to be fixed. To increment the global x, you should have to do defineGlobal("x", x + 1). Sigh.

But I'm writing this up carefully because I thought maybe I ought to check with you, @petrelharp, to see if you have a better idea. Does Python, for example, have a smarter way to handle this global/local scoping issue?

I have also contemplated, for several years now, adding a "global assignment" operator to Eidos, like the <= operator in R. I don't want to use <= since it is also less-than-or-equals, which is a source of confusion in R (just as I chose = for assignment in Eidos, not <-). But perhaps a different operator for global assignment would be a useful addition to the language, as a shorthand for defineGlobal()? I've been thinking about x ~= 5; maybe? So then incrementing global x would be x ~= x + 1;? What do you think?

petrelharp commented 6 months ago

Ooooh, that's a good one. This sounds right to me, and no, python doesn't deal with scoping any differently on this.

I think the only reason that SLiM feels odd on this (like, it acts just like R but feels different) is because you never get to write code in the global namespace, so the only way to define things there is by this defineGlobal( ) function. Also, I suppose that since it's called "define" you wonder "well okay but once it's defined is there a way to modify the value?"; one might then think that maybe doing x = x + 1 is a way to modify it. Even if we didn't mind renaming the function, though, I can't think of a better name that doesn't lead to this.

Also: I think that the overwhelming majority of R folks STRONGLY discourage using the global assignment operator. As you say, scope is good.

So, to be clear:

defineGlobal(x, 1);
x = x + 1;
catn(x);

should print out 2, right?

bhaller commented 6 months ago

It prints out 2, and once this bug is fixed it will still print out 2. But right now (with the bug) the global x changes to 2, and no local x gets defined. Once the bug is fixed, the global x will remain 1, and a new local x will be defined with a value of 2, and then printed. If you exit the local scope and then re-enter it (like, next time your callback/event/function gets called), x will again be 1, because only the local x has changed. That's how it is in R, and as I understand your comment, you are agreeing with me that that is how it has to be in Eidos, too, even if it is sometimes confusing. :->

bhaller commented 6 months ago

The really confusing thing about x = x + 1, with the scoping as it is in R and should be in Eidos, is that x refers to different things on the left and the right-hand side of the statement. :-O

clwgg commented 6 months ago

In python, almost everything works as you describe, but the x = x + 1 situation throws an error:

>>> x = 0
>>> def f1():
...     return x
...
>>> f1()
0
>>>
>>> def f2():
...     return x+1
...
>>> f2()
1
>>>
>>> def f3():
...     x = x + 1
...     return x
...
>>> f3()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f3
UnboundLocalError: local variable 'x' referenced before assignment

if you wanted to actually modify the global x, you would have to manually "pull it into scope":

>>> def f4():
...     global x
...     x = x + 1
...     return x
...
>>> f4()
1
bhaller commented 6 months ago

@clwgg Interesting! I could make Eidos throw an error in this case also; x = x + 1 when left-hand x is a new local variable and right-hand x is a global variable could throw an error. That seems like a good idea. Eidos doesn't have a way to "pull it into scope", however, since we have no equivalent of global x. If you want to increment the global x, you could do defineGlobal("x", x + 1) and it would then not throw an error. If you actually wanted to make a new local variable x and give it the value of the global x plus 1 (but why would you?), it would be a little tricky. For now you could do x = 1 + x instead of x = x + 1 and it would work, because you would no longer hit the optimized case in Eidos; but that's a hack. Or you could do y = x + 1; x = y; and it would work without an error. (I presume that works in Python, also, to make a new local x that is global x plus 1?) A little weird/icky, but it's hard to imagine a case when you would really want to do this anyway, so maybe it is fine for it to be obscure. The benefit of blocking the (very confusing) x = x + 1 case seems worth the possible confusion.

The main tricky thing here is that only some cases of the pathological x = x + 1 type of expression are optimized in Eidos right now (and would thus be able to throw an error). There are lots of other cases that would not throw an error, unless I think of a clever way to test for the general case of the problem. For example, x = x + y would still define local x to be global x plus y, rather than throwing an error. So, hmm.

clwgg commented 6 months ago

Personally I think the most important thing is that locally scoped assignment always makes a local copy, and global scope can only be modified explicitly (i.e. through defineGlobal, in Eidos' case). For the x = x + 1 case, I agree that it's confusing if LHS and RHS of the statement refer to different x, so I prefer throwing the error, but I don't think it's catastrophic if assigning a local x that's global x + 1 works, either. In terms of making a new variable in the python case, you cannot actually use the global variable name later either, e.g.:

>> x = 0
>>> def f5():
...     y = x + 1
...     x = y
...     return x
...
>>> f5()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f5
UnboundLocalError: local variable 'x' referenced before assignment

but explicitly have to make a local copy of the global variable, if you want to modify it's value:

>>> def f6():
...     local_x = x
...     return local_x + 1
...
>>> f6()
1
bhaller commented 6 months ago

... In terms of making a new variable in the python case, you cannot actually use the global variable name later either, e.g.:


>> x = 0
>>> def f5():
...     y = x + 1
...     x = y
...     return x
...

Oh, wow. That's... kinda cool... and kinda "spooky action at a distance", to use Einstein's phrase. Interesting. So Python pre-plans the identity/scope of all of the variables in a local scope before it starts executing the code in that scope. Wow.

petrelharp commented 6 months ago

Oh, right! I forgot about that. I'm also in favor of throwing the "local variable referenced before assignment" error, if possible.

gregorgorjanc commented 6 months ago

Just to add that R uses <<-, not <— for global assignment, if you really have to use it. R does happily pull in a global variable into a scope, which is both neat for rapid typing, but I also find it super annoying practice because it supports users writing sloppy functions that don’t have clearly defined inputs and each attempted modification of a global variable creates a local copy, which can swamp memory quickly if you are not careful.

bhaller commented 6 months ago

Oh yeah, <<- not <=, I misremembered. It has been a long time. :->

bhaller commented 6 months ago

I've decided not to error on things like x = x + 1; since Eidos is not equipped to error on all such constructions, I think it shouldn't error on any such. It should just follow its scoping rules; i.e., act like R. The problem with only erroring on some such constructions and not others is that (1) it might confuse the user more ("why does it error in this case and not in this other very similar case?"), and (2) it might lure the user into a false sense of security (thinking "ah, well, if I do such a thing Eidos will give me an error, so I don't need to worry about it"). Better to just be consistent.