Closed mb147 closed 2 years ago
Thanks for taking the time to write all this down. As I hinted when you first raised this at StackOverflow, this is mostly an R-internals discussion you should have with the likes of Luke Tierney over on r-devel.
Rcpp does not do R's memory management, and does not call gc()
and alike. I think your language here is inflammatory ("leak") which makes a claim you have, as far as I can tell, not supported by any evidence: if there were a leak, valgrind
would surely tell us given that all 2500+ CRAN packages using R are being tested with it regularly. None is repored leaking.
Now, is it possible that we fail to incerment or decrement a usage counter? Quite possibly. Does it matter? I don't think so, but I could of course be wrong -- and would be happy to be convinced otherwise.
Lastly, R's reference counting is both somewhat robust and under what you may call permanent rewrite. We have the basic framework for ALTREP, and Luke Tierney sometimes hints that, time permitting, the R internals may get another update / refactoring. All that is worth keeping in mind.
Firstly, apologies if I've come across as inflammatory. That certainly wasn't my intention. I've just written out the facts of what I'm seeing. And I totally agree with you that it would be highly surprising if that amount of testing wasn't showing up a true memory leak. I'm at a loss to know what better to call it though, given that the result of it is that there's memory that's getting used and not released when it appears it should be.
The way things stand, SET_TAG in Rcpp_PreciousPreserve increments ref counter on object. There is nothing in the Rcpp_PreciousReleasethat correspondingly decrements it, and it therefore hangs around and never gets garbage collected.
I can see that there's a case for picking up with Luke, because this is something that garbage collection could, and perhaps should, detect and resolve in the R internals. I'll do just that. (R internals could check for LISTSXP type being collected, and decrement object pointed to by TAG as part of that.)
And I can see why you could legitimately wash your hands of it and just say "i've removed the pairlist element, anything else is part of what gc should be doing".
But I think in this case we could be a better citizen and ensure that the Release code does the cleanup by unsetting TAG. I'm not in any way suggesting that Rcpp should get involved with R's actual memory management, just undo what was done during Preserve from its own perspective. What R then does with that under the hood is none of Rcpp's business.
The reason this matters is the copy that gets created on modify after returning control to R. I'm calling fastLm many thousands of times from R, with reasonable size matrices, so the memory copies really add up. Yes, there's a strong case for rewriting my code in Rcpp to just avoid the back and forth, and I'm working on doing that. If I just work in Rcpp with memory allocated outside R then the problem mostly goes away for me.
But it shouldn't strictly be necessary for me to do that to avoid the performance penalty. With a simple bit of tidying up, copying can be avoided, which is a win. So I'm curious now... What do you see as the downside in making the fix I've suggested? It would be good to get it fixed in one place or the other.
Thanks for engaging with me on this btw. I've been coding for an age, but never in the open source arena until recently. The time and effort you, and others, put in to this is awesome. And the fact that I can talk directly to you about this is something I'd never have imagined 25yrs ago.
Howdy -- thanks for keeping this very constructive. It's been a while since I look at it but as I recall it (which may well be faulty) the refcounter is not exactly 'linear'. The net effect between values n and n+1 is non-exactly meaningful. I think the scheme changed a little in the last few years but what matters really is (again, my recollection, may be wrong) is values 'none', 'one', 'more'.
I think what I would lean towards is trying to do something along the lines of your hypthesis (i.e. setting/unsetting the tag) with plain C(++) code (no Rcpp) and seeing if R 'picks it up as you expect' and releases object. If you then mirror that with Rcpp and we do something that prevents then we do have an issue. But until we actually show that objects go away on the R side I am just not so sure we have something here.
And I with you: the scope of open source and what / and how we can do things these is so cool compared to where we where mid-nineties. Anyway. gotta run now...
Refcounting in R has been updated more recently to be a genuine counter, moving away from the NAMED 0/1/more that it used to be. As I said previously, until that change happened I don't think it would have been possible to spot this, since it relies on the counter having true decrement support. But the change in ref counting does present a good opportunity to be more efficient with copying of objects, which is what I'm aiming for here.
What you suggest with pure C code is, I believe, precisely what I've done in the code sample above. My rewritten version of Rcpp_PreciousRelease has a flag that turns the decrement on and off, and the inspect output in R shows that it puts the ref count in the expected state. Not suggesting the flag is required in the fix, just the SET_TAG call.
I've verified that I am able to modify-in-place with the fix enabled, which can only happen if ref count is genuinely 1 as expected. Likewise, verified I get a copy with fix disabled which replicates current Rcpp behaviour.
I trust that the R garbage collection is working correctly once we get the ref count to decrement to where it should be. Along the same lines as the logic you stated above, pretty sure that would have been noticed by now if it wasn't the case. I'll do some checking though to show how memory usage evolves with and without fix in place.
If there's a clearer way for me to present the findings then I can do so. Just let me know what you need. Fix is pretty trivial, so I've taken the step of forking the repo and committing the change. I'll hold off on pull request though until we agree here.
Refcounting in R has been updated more recently to be a genuine counter, moving away from the NAMED 0/1/more that it used to be.
Right. And some of our code may predate that so there may be room for improvement.
I trust that the R garbage collection is working correctly once we get the ref count to decrement to where it should be. Along the same lines as the logic you stated above, pretty sure that would have been noticed by now if it wasn't the case.
Yes. Still, there may be room for improvement.
If there's a clearer way for me to present the findings then I can do so.
I think I would feel more compelled if we got R to show, say, "see here 1000 calls under the old scheme and X mb remain allocated" and magically they would not be with tags set and whatnot. But the memory management inside R is a little more misterious in its ways...
Anyway, I should find some time and study your proposal some more.
This is good stuff, thanks for putting it together, @mb147. I agree that we should add a SET_TAG(token, R_NilValue);
in the line suggested. SET_TAG
increments/decrements the ref count. The purpose of the first SET_TAG
is to mark the object as referenced, and thus we should likewise decrement that reference once the storage is freed. The change required is very simple and innocuous, which is a plus.
That said, there is no memory leak. The garbage collector knows better. You can quickly see that by creating a big object, passing it through your empty currentRcppProblem
function, then rm'ing the object and calling gc()
: memory gets deallocated.
And that said, still Dirk's advice applies: avoid going back and forth. There are just so many places in which the ref count could be updated (legitimately or not). Even after implementing this fix, you are doomed to check and inspect everything you are doing, and you'll never be entirely sure. So it's best if you just go full C++ and then return to R when you are done.
Thanks both for your input on this.
I'm feeling like a doofus for not having done the obvious check of looping a large object. But I guess that's what comes of trying to juggle too many things and look at this alongside by day-job. Thanks for picking up where I'd dropped the ball, and also for supporting the fix.
I've taken the liberty of renaming this issue to remove the reference to memory leak, which was disproved - so there's a better history of what this issue really was.
Reverse dependency check is almost done so about to merge this -- this is making things better so many thanks again for very diligent work here!
Brief background
Rcpp keeps a precious pairlist when PreserveStorage is used. Releasing an element of that list leaves a ref behind, creating a later memory leak.
As of v4 (certainly) (but possibly as early as Nov 2019), R moved from the NAMED mechanism to ref counting. This has exposed an issue that likely went undetected before then, because object copies were more common. https://stat.ethz.ch/pipermail/r-devel/2019-November/078728.html
Relates to my recent question on SO: https://stackoverflow.com/questions/71417422/
Proposed fix
In Rcpp, barrier.cpp, Line 122, add:
SET_TAG(token, R_NilValue);
Use case description
Reproducible example
See below, which does the following:
System detail
R: 4.1.2 Windows 11 Rcpp: 1.0.8
Runnable code
Code output