Open MichaelChirico opened 4 years ago
I get
bug_fun<-function()
+ { + print(parent.frame(n=2)) + + print(sys.frame(sys.parent(n=2))) + }
bug_fun()
Ok. After your feedback I took a closer look at how to replicate the symptoms.
I have seen these results using R Term and using RJ in Eclipse in interactive mode.
The symptoms do not show up when I highlight the code and "Run Selection in R".
The symptoms DO show up when I step through the code using "Run Entire Command in R and Go to Next Command".
The symptoms also show up when I submit the code as a file using a source() call, either by typing the source() call into the console or by using the "Run File in R via Command" option.
I will attach a file called sys_parent_defect.R. If you start an RJ console and submit this file via a source() command, you should be able to see the symptoms. (I think it's most easily replicated this way because the source() call appears to establish at least two contexts in the stack prior to R_GlobalEnv.)
Here's the code that will be in the file:
bug_fun<-function() { print("sys.status()") print(sys.status()) print("parent.frame(n=1)") print(parent.frame(n=1)) print("parent.frame(n=1)") print(sys.frame(sys.parent(n=1)))
print("parent.frame(n=2)")
print(parent.frame(n=2))
print("parent.frame(n=2)")
print(sys.frame(sys.parent(n=2)))
print("parent.frame(n=3)")
print(parent.frame(n=3))
print("parent.frame(n=3)")
print(sys.frame(sys.parent(n=3)))
print("parent.frame(n=4)")
print(parent.frame(n=4))
print("parent.frame(n=4)")
print(sys.frame(sys.parent(n=4)))
}
bug_fun()
Here's the output I see when I submit the code via source(). You notice that parent.frame() and sys.frame(sys.parent()) disagree at both n=2 and n=3:
[1] "sys.status()" $sys.calls $sys.calls[[1]] source("C:/Users/lunga/Desktop/Eclipse Workspace/SAMPLE/sys_parent_defect.R", echo = FALSE, encoding = "Cp1252")
$sys.calls[[2]] withVisible(eval(ei, envir))
$sys.calls[[3]] eval(ei, envir)
$sys.calls[[4]] eval(expr, envir, enclos)
$sys.calls[[5]] bug_fun()
$sys.calls[[6]] print(sys.status())
$sys.calls[[7]] sys.status()
$sys.parents [1] 0 1 1 3 0 5 5
$sys.frames $sys.frames[[1]] <environment: 0x000000000fc70590>
$sys.frames[[2]] <environment: 0x000000000fb87730>
$sys.frames[[3]] <environment: 0x000000000fb87928>
$sys.frames[[4]]
Created attachment 1506 [details] test case. submit via source() in interactive mode to replicate symptom
Created attachment 1507 [details] version of test script with clarified labels in the output.
This is an updated version of the test script. It works the same, I just adjusted the text in some of the print() statements to appropriately identify the outputs.
Properly labeled outputs look like this:
[1] "sys.status()" $sys.calls $sys.calls[[1]] source("C:/Users/lunga/Desktop/Eclipse Workspace/SAMPLE/sys_parent_defect.R", echo = FALSE, encoding = "Cp1252")
$sys.calls[[2]] withVisible(eval(ei, envir))
$sys.calls[[3]] eval(ei, envir)
$sys.calls[[4]] eval(expr, envir, enclos)
$sys.calls[[5]] bug_fun()
$sys.calls[[6]] print(sys.status())
$sys.calls[[7]] sys.status()
$sys.parents [1] 0 1 1 3 0 5 5
$sys.frames $sys.frames[[1]] <environment: 0x000000000fcea1d0>
$sys.frames[[2]] <environment: 0x000000000fd67780>
$sys.frames[[3]] <environment: 0x000000000fd66ba0>
$sys.frames[[4]]
I can confirm this bug in R 3.6.2 and current R-devel (r77656).
The details in help(parent.frame) say:
‘parent.frame(n)’ is a convenient shorthand for
‘sys.frame(sys.parent(n))’ (implemented slightly more
efficiently).
source()ing the example script illustrates that this is not always true.
From Ross Ihaka's comment on R_sysparent in src/main/context.c it seems the limited view of sys.parent() originates from "compatibility with S".
You can do evalq(bug_fun()) and achieve a similar effect without using source().
The basic issue is that if the parent stack is like (0,1,0), and the current frame is 3, as in
print(sys.status()) # NB this adds frame 3 to the context stack, twice. ... $sys.parents [1] 0 1 0 3 3 ...
In that situation sys.parent(2) gets you 0, but parent.frame(2) gives you frame 1.
It appears that parent.frame walks the frame stack, whereas sys.parent iterates sp[sp[sp[...]]], which in this case finds sp[3] == 0, and then doesn't look further since it is already at .GlobalEnv.
Another way to put it is that sys.parent(2) gives you the parent of the parent, whereas parent.frame(2) gives you the (frame of) the parent of the caller.
I suspect that it is parent.frame that is doing the wrong thing here, but maybe it is just the documentation that is wrong. parent.frame() is not really designed to be used in situations where you evaluate in unusual contexts.
package rlang provides a function eval_bare giving a different result...
rlang::eval_bare(bug_fun())
[1] "sys.status()" $sys.calls $sys.calls[[1]] rlang::eval_bare(bug_fun())
$sys.calls[[2]] bug_fun()
$sys.calls[[3]] print(sys.status())
$sys.calls[[4]] print(sys.status())
$sys.parents [1] 0 0 2 2
The help page says
• ‘on.exit()’, ‘parent.frame()’, ‘sys.call()’, and generally
all the stack inspection functions ‘sys.xxx()’ are evaluated
in the correct frame environment.
I independently ran into this problem. At a minimum it seems the documentation should be updated, but I do agree with Prof. Daalgard that parent.frame()
seems more wrong. Here is an example that also provides debug info:
f1 <- function() list(
parents=list(
sys.frame=sys.frame(sys.parent(2)),
parent.frame=parent.frame(2)
),
data=list(
frames=sys.frames(),
parents=sys.parents(),
calls=sys.calls()
) )
f0 <- function() evalq(f1())
x <- f0()
str(x)
## List of 2
## $ parents:List of 2
## ..$ sys.frame :<environment: R_GlobalEnv>
## ..$ parent.frame:<environment: 0x7fb38f3b2af8>
## $ data :List of 3
## ..$ frames :Dotted pair list of 4
## .. ..$ :<environment: 0x7fb38f3b2c80> <<< Intended
## .. ..$ :<environment: 0x7fb38f3b2af8>
## .. ..$ :<environment: 0x7fb38f3b2c80> <<< Duplicate
## .. ..$ :<environment: 0x7fb38f3b2708>
## ..$ parents: int [1:4] 0 1 2 1
## ..$ calls :Dotted pair list of 4
## .. ..$ : language f0()
## .. ..$ : language evalq(f1())
## .. ..$ : language evalq(f1())
## .. ..$ : language f1()
As noted previously parent.frame
implicit ignores the the possibility of the same frame appearing more than once in the stack and grabs the first one that matches the cptr->sysparent
1, without consulting the parent list. If it did it would skip the first instance at position 4 of 0x7fb38f3b2c80
and go to the one at position 1.
I suspect the solution here is to redirect parent.frame
to internally just do what sys.frame(sys.parent(n))
does. I can't imagine the performance difference is enough to justify the semantic differences when a frame is re-used via eval
. The cost is that it might break code that relies on what I think is the incorrect current behavior.
Given that there is a perfectly valid work-around in sys.frame(sys.parent(n))
anything but updating the documentation is gravy.
Upon review of the sources, I've found that both parent.frame
1 and sys.parent
2 will traverse the context stack by finding the context that has for frame (i.e. cptr->cloenv
) the cptr->sysparent
of the current environment.
The difference between the two is that sys.parent
3 will descend the entire context stack finding the oldest matching context, whereas parent.frame
will stop at the first matching context. This will cause difference in behavior when the same frame appears multiple times on the context stack as may be the case when using eval
and similar.
sys.parent
is most likely to return the context that corresponds to the closure that generated the evaluation environment as the environment is first created by the closure. There will be cases involving eval
and similar where the cptr->sysparent
may not have been generated by a closure, in which case what the right environment should be becomes quite philosophical.
However, given that sys.parent
seems to behave more correctly in the most likely use case, it seems at a minimum we should document/explain the difference, or maybe modify parent.frame
behavior.
`sys.parent` is most likely to return the context that corresponds to the
closure that generated the evaluation environment as the environment is first
created by the closure.
Personally I think the youngest duplicate frame on the stack should be the most immediate parent. So I would rather change the sys.foo
functions to reflect the same structure as parent.frame()
. I think it will produce much less confusing structured backtraces when duplicate frames are on the stack because of NSE, do.call()
, or Rf_eval()
callbacks.
Could the sys.xxx()
functions and perhaps parent.frame()
gain an argument to control which duplicate frame (youngest or oldest) is the parent? They would have different defaults, so no breaking changes.
This would help IDE introspection to reproduce the same representation of the call stack as defined by parent.frame()
. The documentation of the argument would also help users understand these subtleties instead of by trial-and-error.
Could the `sys.xxx()` functions and perhaps `parent.frame()` gain
an argument to control which duplicate frame (youngest or oldest)
is the parent? They would have different defaults, so no breaking
changes.
That's an interesting idea. I was going to mention I don't know if I'm bought in that parent.frame
's behavior is the right one to adopt, but this proposal has the benefit of not changing default behavior, providing a natural anchor to document the issue, and also full control. Obviously I understand folks might be reluctant to add parameters to such low-level long-established functions.
I have taken a look at the places where parent.frame(n) is being used in the R sources. The generic case is that it occurs within a helper function like match.fun() where the intention is to look in the parent environment of the caller, or the caller of the caller, or....
In those cases, it likely coincides with sys.frame(sys.parent(n)), because it is known that all intermediate function calls are regular calls with no diddling of environments etc., but this is not the case in general.
The internal construction of the frame stack is that it is a linked list where each frame contains a pointer to the caller and a pointer to the parent environment. This makes the implementation of parent.frame easy and efficient because you just walk back via the caller pointer and then return the parent-pointer of that frame. In contrast, anything involving frame numbers requires a walk through the entire stack. If you want to sort out parent frame numbers, you need to match them to each frames parent environment. This is somewhat complicated and can also be inefficient in case the frame stack is deep in some recursion.
All in all, I think parent.frame() does what it is supposed to do and does it well. We just need to fix the documentation to say what it actually does and not say that it is equivalent to sys.frame(sys.parent(n))
Created attachment 2625 [details] possible update to sys.parent docs
Updating the documentation seems like a good enough outcome to me. Attached is a possible edit, though my purpose in providing it is to highlight that, AFAICT, implementation details aside, the key semantic difference between the two is that when following the ancestry chain sys.parent
finds the oldest ancestor at each step, whereas parent.frame
finds the youngest. Otherwise they both iteratively look for cptr->sysparent
amongst the cptr->cloenv
s on the stack, if I'm understanding correctly.
Hmm, just realized that running example(sys.parent)
obviously produces different output than just running the example I added manually, so this patch should not be applied as is. I won't make further changes until hearing further from you.
NA
I discovered this bug when I was attempting to walk up the call stack. I noticed that I could use parent.frame() to walk farther up the stack than I could using sys.parent().
This seemed to happen whenever two calls used the same environment. I think this happens mostly when eval() is called.
Execute the following code to reproducing the bug:
bug_fun<-function() { print(parent.frame(n=2))
}
bug_fun()
The two lines inside of bug_fun() do not produce the same output as they should. Example output from my system:
<environment: 0x000000000f86f708>