JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.85k stars 5.49k forks source link

stop using finalizers for resource management? #11207

Open JeffBezanson opened 9 years ago

JeffBezanson commented 9 years ago

Finalizers are inefficient and unpredictable. And with the new GC, it might take much longer to get around to freeing an object, therefore tying up its resources longer. Ideally releasing external resources should not be tied to how memory management works.

We are already not far from this with the open(f) do construct. I think that and/or with should be used. Perhaps there could be some other mechanism for registering files to close eventually.

Discussed this with @carnaval .

ScottPJones commented 9 years ago

Hmmm... I was just going to start using finalizers (but still had some questions about them to investigate). My needs are simple: 1 pointer in Julia to a structure allocated / controlled by C, which also contains a pointer possibly allocated by my C code, or else allocated by DBMS allocator. If the object goes out of scope in Julia, and is about to be GCed, I thought the finalizer would allow me to call my C release code... It concerned me though that finalizers are apparently associated with the object, not the type.

quinnj commented 9 years ago

Go has the defer keyword, the usage is:

f := os.Open(file) defer f.Close()

which pushes f.Close() onto a stack of function calls that get evaluated when the enclosing scope ends.

http://blog.golang.org/defer-panic-and-recover

-Jacob

On Fri, May 8, 2015 at 8:48 PM, Scott P. Jones notifications@github.com wrote:

Hmmm... I was just going to start using finalizers (but still had some questions about them to investigate). My needs are simple: 1 pointer in Julia to a structure allocated / controlled by C, which also contains a pointer possibly allocated by my C code, or else allocated by DBMS allocator. If the object goes out of scope in Julia, and is about to be GCed, I thought the finalizer would allow me to call my C release code... It concerned me though that finalizers are apparently associated with the object, not the type.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/11207#issuecomment-100413664.

timholy commented 9 years ago

If one is rethinking this, the machinations of CUDArt to manage GPU memory in a GC-compatible way are probably amusing fodder for thought. The arrival of finalize was a huge step forward.

NOTE: 2nd link updated to correct target.

carlobaldassi commented 9 years ago

For reference, I'll also add the case of GLPK, which bears similarities with CUDArt, see e.g. here and here.

JeffBezanson commented 9 years ago

See also #1037

It would be great to get rid of finalizers entirely, but that's probably not realistic. For starters, I would still allow finalizers but not use them to close files and such.

@ScottPJones you can definitely use finalizers to call your C release code.

Finalizers can be associated with a type by adding them to all instances in the constructor :) Seriously though, I'm not sure how it would work to associate finalizers with types. For example, you can't iterate over all dead objects to see which ones might have finalizers. And how is the right type identified? That's usually done through method calls, but who calls what function, and when, to determine what to finalize? The simplest thing is to just give the GC a list of objects with finalizers attached.

elextr commented 9 years ago

@JeffBezanson it is very useful to have a mechanism to allow freeing of limited resources (like file descriptors) as soon as reasonably possible. As you say finalizers will eventually get around to it, but that doesn't prevent exhaustion in the meantime.

One question, are finalizers always run, no matter how the program exits, so its always possible to be sure any resource does not remain locked?

ScottPJones commented 9 years ago

@elextr yes - that sort of exhaustion has been a big issue with the sort of code that I write, where it has to stay running with minimal downtime for years...

elextr commented 9 years ago

@ScottPJones then its probably best if you do your resource management explicitly yourself, certainly don't rely on anything in the semantics of any language, unless specified and guaranteed.

Specifically the semantics of the Julia GC and hence finalizers is not guaranteed, it currently happens to have recently changed to a generational GC in 0.4, but is not generational in 0.3, and that may change in 0.4/0.5 again when threading lands (for example). All you can know about a finalizer is that the object it relates to is no longer in use when the finalizer is run, but my reading of this suggests that it may not be run for bitstypes, hence my question above.

aviks commented 9 years ago

Another use case is the interaction of the Java and Julia GC's in JavaCall. Objects retrieved from Java into Julia need to explicitly de-referenced in Java when they are no longer used within Julia. This is achieved via the finalizers. Which works fine, except that the Java VM can have greater memory pressure than the Julia VM. In that case, the JVM can run out of memory, before Julia decides that the GC needs to be run.

timholy commented 9 years ago

@ScottPJones, I hear you. In several places like HDF5 and CUDArt, the key was to write code like

open(filename) do file
    # do stuff
end

which guarantees that file will be closed (immediately upon completion) even if stuff has an error in it. That construct currently has some performance overhead (anonymous functions), but in most cases is worth it. You can manually use try...finally in cases where you can't tolerate the overhead.

ScottPJones commented 9 years ago

@elextr I should have been clearer... I'm not planning on relying on the finalizers at all. My APIs are identical (with some name changes, can't have ! in C function names), in C11, C++11, Python, Java, and Julia... I just want to prevent memory leaks, esp. when people are playing around / prototyping stuff in the REPL. For example, something like the following:

myObj = DA.PackedData(1000) # creates a packed data buffer with initial size at least 1000 bytes.
push!(myObj, "Encode a string")
push!(myObj, 5.2332) # encode an IEEE binary floating point number
save!(myDBMS, myObj) # write packed record out as a row
release(myObj) # Release the underlying C buffer object, 0 out the pointer in the Julia myObj object...

What happened many times, in the REPL, is I accidentally set myObj to something else before calling release... so I lost memory each time... Using the finalizer is just to catch stupid things like that...

@timholy That's good to know, but is that sort of syntax only for files? (sorry, my newbieness with Julia is showing again!)

timholy commented 9 years ago

@ScottPJones, it's a standard julia convention, see http://docs.julialang.org/en/latest/manual/functions/#do-block-syntax-for-function-arguments. You have to write a version of your function that takes another function as the first argument (see, e.g., the methods defined for open). Internally, it's just try...finally.

tknopp commented 9 years ago

@ScottPJones No the do syntax is not restricted to files see http://julia.readthedocs.org/en/latest/manual/functions/#do-block-syntax-for-function-arguments

I think this is the standard way to do it in Julia and as @timholy said it is used in various places in Julia land. In Gtk.jl we have also some places.

Where the finalizers are important is when the type goes out of scope. We have for instance in Gtk.jl the situation where it is really needed.

tknopp commented 9 years ago

oh Tim is faster, sorry.

ScottPJones commented 9 years ago

Thanks @tknopp & @timholy! Sorry for the noise, I really am trying to memorize the manual, but Julia is such a large language!

timholy commented 9 years ago

It definitely takes a while, no apologies needed.

tknopp commented 9 years ago

@JeffBezanson: What is the actual proposal of this issue? Isn't the do syntax already consistently been used for files? I think the finalizers are useful when the scope is not local.

carnaval commented 9 years ago

We probably can't remove finalizers alltogether because then we would be leaking resources. I think this issue is more about conventions on "good practice for resource management" since the biggest problem (besides performance) is that the gc is very lazy : it only works under pressure, that is memory pressure. It has no way to know e.g. how many file descriptor are open by the program, so if your handle object is small, the gc will be completely fine keeping it around for a long time while you exhaust your open fd limit.

I don't have any good idea about this by the way...

wildart commented 9 years ago

I found finalizers unreliable. When interfacing with C code, I would really prefer something like Go defer rather then use finalizer to release resources. I opt to a manual resource management event though it increase several times amount of code to be written.

JeffBezanson commented 9 years ago

@tknopp good question. My proposal would be

The last item sounds drastic, but as it is finalizers might not be invoked for a very long time, and unpredictably. You could still use finalizers as an escape hatch. If you're not sure how to handle releasing some object, you can just call finalizer(x) on it any time.

tknopp commented 9 years ago

Ok. Is there some issue what with is and where it differs from the do syntax?

carlobaldassi commented 9 years ago

I'll just add another small issue about using finalizers with IO objects which I very recently discovered: on Windows, trying to call rm on a file with an open descriptor fails. This made the FastaIO tests fail, because I was relying on finalize to close the file after I finished reading it, and I was deleting it after the tests. I never noticed the bug since on Linux that works fine. So this is probably not a very common situation, but — in association with the unpredictability of the GC — may lead to OS-specific, non-deterministic bugs.

elextr commented 9 years ago

@JeffBezanson how do you propose to handle objects whose lifetime exceeds the scope of the with, eg ones returned from the function?

JeffBezanson commented 9 years ago

If an object lifetime exceeds the local scope, you can't use with. The only options I see in that case are (1) somebody downstream uses with, (2) you add a finalizer to the object before returning it.

StefanKarpinski commented 9 years ago

Another idea is to have some types opt into reference counting and finalize them when their counts get to zero. It's not entirely clear to me how to make a mix of refcounting and not work, however.

jakebolewski commented 9 years ago

watch out, may be flayed by mentioning reference counting :-)

carnaval commented 9 years ago

the problem with mixed refcount is that a refcounted object can still be kept alive by a non refcounted one (worst case : the object keeping it alive is in oldgen). Then you don't get the "immediate finalization" property.

carnaval commented 9 years ago

To alleviate the late finalization problem we could also teach the gc about other kind of resources so that it can be taken into account in the collection heuristics. So e.g. you could register a "file descriptor", or "GPU memory" something, and then explicitely say : I allocated X of this, running this finalizer will get me Y of this back.

Painful to implement though. And it can only make gc overhead worse (by collecting more often).

StefanKarpinski commented 9 years ago

the problem with mixed refcount is that a refcounted object can still be kept alive by a non refcounted one (worst case : the object keeping it alive is in oldgen). Then you don't get the "immediate finalization" property.

Yes, in such a scheme every reference that could transitively reach anything refcounted would need to maintain a refcount. That includes most abstract slots, and slots in data structures that can refer to refcounted objects. But that still excludes most things we care about the performance of.

ScottPJones commented 9 years ago

Well, mix refcount in some sense is what I'm already doing: in Julia, I have a type that contains a Ptr{} to a C allocated (refcounted) structure. Normally, I call release! on the Julia object, and it calls C to decrement the refcount, and I clear my pointer in the Julia object. I've been adding finalize calls so that if the Julia object is garbage collected, the refcount is still correct, and the memory gets freed on the C side... (mainly for when people are using the REPL, and might accidently overwrite the variable containing the Julia object). [I hope this technique seems OK to all of you!] Note, why can't Julia, when it is freeing an object, simply check to see if there is a finalize! method for that type, and call that?)

carnaval commented 9 years ago

@StefanKarpinski which is practically every possible object (modules, generic functions, ...) except some leaf datatypes. I'm not eager to retrofit refcount inside the C codebase (and the generated code) everywhere. Especially since naive refcount is inefficient so we would need to implement deferred refcount on top of that.

@ScottPJones speed. The GC tries very hard to be O(1) in the size of garbage memory (and it already fails at that in many regards). Adding a linear pass over all garbage is performance suicide, especially since typically a very small number of object needs finalization. If this is more of an UI question then we could have it look like this but we still would be storing an explicit list of live objects with finalizers somewhere.

jakebolewski commented 9 years ago

@StefanKarpinski I don't see how you set up that transitive relationship when object does not even exist yet. Would you walk the object graph and then mark every reference that could reach a ref counted object every time one is constructed?

tkelman commented 9 years ago

Have wondered about this in the context of libxml bindings, where I think the most sane solution right now would be to implement manual refcounting. That hasn't been done yet so right now code that uses LightXML is either leaky or can have finalizers added that might be dangerously unaware of object relationships that are only directly observable through C.

ScottPJones commented 9 years ago

@carnaval I think that maybe there might be some sneaky means to not make it need a linear pass over all garbage... I'll have to really dig into how the Julia GC works before I could say too much more. I worry that the current method of having to have a list of objects with finalizers is also performance suicide... it would seem that if the finalizer for a particular type of object were always finalize!, and the finalize! methods were written within the type declaration, like inner constructors, you might be able to segregate the allocation of objects that have finalization, so that would not be a problem... (or am I just being dense today? [could very well be!]

carnaval commented 9 years ago

@ScottPJones Sure we can do lots of things to make it faster than it is now. Segregation might be a pain since we use size-segregated pools so you will end up wasting quite a lot of memory if you only have a few serializable objects. Going through the list is not that much of a problem given that we actually have one of those lists per generation, so most of the time you are only checking the young one, which is mostly garbage, so you are mostly doing useful work. The costly thing here is probably that going through the list implies taking a (likely) cache miss for every object to check whether it is now garbage or not. If being finalizable was a property of the object's type we could have a hidden field per object storing its position in the finalizer list and keep a separate finalizable_alive bitfield so that, at the end of the collection, we could be lazy and only work on objects we are sure are now dead.

carnaval commented 9 years ago

I must add that much of the performance issue goes away if you use native finalizers instead of julia ones, since the julia ones requires keeping the object alive for one more collection.

ScottPJones commented 9 years ago

native finalizers? Please explain...

ScottPJones commented 9 years ago

If being finalizable was a property of the object's type we could have a hidden field per object storing its position in the finalizer list and keep a separate finalizable_alive bitfield so that, at the end of the collection, we could be lazy and only work on objects we are sure are now dead.

@carnaval That's exactly the sort of thing I was trying to get to... would that be hard to implement? What are all the pros & cons?

carnaval commented 9 years ago

a quick google search revealed that it does not seem documented. oups.

The finalize function can accept a C function pointer instead of a julia function. The C function is then run in the middle of gc instead of at the end. Of course, if this C function touches the julia runtime, bad things can happen.

About the bitfield stuff, I don't see any obvious cons, except for the implementation effort of course. It should not be too hard to implement but be aware that the gc code is ... hem kind of messy. I would first start by putting up a benchmark were going through the fina list is actually the bottleneck (and not running the finalizers themselves).

If you start looking into this you may find lower hanging fruits however since I don't recall anyone spending too much time optimizing this.

ScottPJones commented 9 years ago

That will be easier then (using the native finalizers), of course now I have to go change my code again! Thanks @carnaval!

If you start looking into this you may find lower hanging fruits however since I don't recall anyone spending too much time optimizing this.

Well, I seem to already be doing well optimizing low hanging fruits in string handling, and that's another area where I've done similar stuff in the past, and might be able to help... (if I speed up parts of Julia enough, and [heaven forbid!] the wonderful startup I'm consulting for doesn't take off, maybe I can beg for a job at JC :grinning: [if the founders don't tar and feather me first!]) [if the startup does takes off, as I think it will, then next year I'm going to try to convince them to be a sponsor at JuliaCon]

elextr commented 9 years ago

@JeffBezanson you talked about using finalize() on BigInt, but who is managing the lifetime of that BigInt, ie who will call the finalize(). For something like a BigInt I would have thought that the restriction to it only being used in a with is pretty untenable.

As @carnaval pointed out, reference counting can't be mixed with normal Julia references, reference counted objects must only be accessed via "counting references" and that would have to be enforced by the compiler. The "counting references" then do the right thing when the reference is created, accessed, assigned, copied (parameter passing/return), and destructed (scope exit, including via exception). IIUC currently not any/all those actions can be intercepted in Julia.

JeffBezanson commented 9 years ago

AFAICT BigInts cannot perform well if they depend on finalizers. The only way forward for BigInts is not to use finalizers at all.

ScottPJones commented 9 years ago

It sounds like BigInts (or at least portions of the code dealing with allocation/release) needs to be coded in Julia... wouldn't that remove the dependency on finalizers?

JeffBezanson commented 9 years ago

Yes, but we might not need to go that far. See #11202.

davidanthoff commented 9 years ago

It might be worth reading through the .Net Dispose Pattern. It essentially codifies much of what @JeffBezanson proposed, but takes a slightly different approach on banning finalizers altogether. The gist of it is that if an object has been disposed manually (i.e. its non managed resources have been freed by a call to ´´Dispose´´), then one can mark this object instance as not needing finalization anymore.

PS: This issue is strange to read, there are lots of responses to @ScottPJones, but not a single post from him?

ScottPJones commented 9 years ago

@davidanthoff I hope that wasn't a snide remark! :grinning: I've been busy trying to 1) treat my wife to a nice Mother's Day, 2) get my PR #11186 as nice as possible, 3) get my "real" work with Julia done 4) figure out how to bribe any of the collaborators to merge in some of my Unicode fixes/performance improvements! I've had a crazy idea... but I'm sure somebody here will shoot it down... besides having a finalize! method associated with a type, so that the box would just need to have a bit set to show that that object needs to be finalized by calling it's finalize! method, I was thinking you could also have maybe 16-bit reference counter, which starts at 1 when the object is allocated, it is only incremented if non-zero, so if it hits zero, it sticks there (meaning it will only get finalized when the GC gets to it), but if decremented to zero, the object is immediately finalized... (and the "need finalize" bit cleared) I'm sure @JeffBezanson and @StefanKarpinski will tell me why I'm all wet... but that idea's been bouncing around my head all yesterday (could just be the drinks celebrating Mother's Day with my wife though!)

StefanKarpinski commented 9 years ago

Not sure what happened but ScottPJones GitHub user account doesn't seem to exist anymore.

ScottPJones commented 9 years ago

@StefanKarpinski I don't even know if you see this... I've been declared "nonhuman" by GitHub!

carnaval commented 9 years ago

So Scott's account was finalized ?

Sorry I'm out.

tkelman commented 9 years ago

Very strange, even PR's opened by him (like #11186 and others) are 404's now?

edit: seems to be something on github's side, hopefully will be resolved soon enough?