stop using finalizers for resource management?

JeffBezanson commented 9 years ago

Finalizers are inefficient and unpredictable. And with the new GC, it might take much longer to get around to freeing an object, therefore tying up its resources longer. Ideally releasing external resources should not be tied to how memory management works.

We are already not far from this with the open(f) do construct. I think that and/or with should be used. Perhaps there could be some other mechanism for registering files to close eventually.

Discussed this with @carnaval .

mbauman commented 9 years ago

Yikes. I hope his employer didn't make him delete everything. :(

I hope all is well.

StefanKarpinski commented 9 years ago

Seems that GitHub decided that Scott was a bot and removed all trace of him. He's contacted support and is trying to recover the account.

MithrandirMiles commented 9 years ago

I'm back, but just to respond to messages (I don't want to give up my ScottPJones account)

MithrandirMiles commented 9 years ago

I think GitHub has a very bad design, if when it makes somebody "invisible" (with no warning), it makes your entire history invisible, messages that other people might have wanted to see :-1: Where do we put issues on GitHub's design??? Grrrrr! Thanks for all the very touching concern everybody!

MithrandirMiles commented 9 years ago

This is a copy of a now invisible message I sent a while ago (before realizing what GitHub had done to me!)

@davidanthoff I hope that wasn't a snide remark! :grinning: I've been busy trying to 1) treat my wife to a nice Mother's Day, 2) get my PR #11186 as nice as possible, 3) get my "real" work with Julia done 4) figure out how to bribe any of the collaborators to merge in some of my Unicode fixes/performance improvements! I've had a crazy idea... but I'm sure somebody here will shoot it down... besides having a finalize! method associated with a type, so that the box would just need to have a bit set to show that that object needs to be finalized by calling it's finalize! method, I was thinking you could also have maybe 16-bit reference counter, which starts at 1 when the object is allocated, it is only incremented if non-zero, so if it hits zero, it sticks there (meaning it will only get finalized when the GC gets to it), but if decremented to zero, the object is immediately finalized... (and the "need finalize" bit cleared) I'm sure @JeffBezanson and @StefanKarpinski will tell me why I'm all wet... but that idea's been bouncing around my head all yesterday (could just be the drinks celebrating Mother's Day with my wife though!)

davidanthoff commented 9 years ago

@MithrandirMiles: no snide remark intended, I was just wondering what was going on. Glad this is "just" a github problem!

MithrandirMiles commented 9 years ago

@davidanthoff That's why I put the :grinning:! (I've tried to increase my SNR around here... hopefully enough that you all will put up with me further!)

JeffBezanson commented 9 years ago

@MithrandirMiles this is terrible, I hope you get your account back!

The problems with refcounting are

space to store the refcount
the time spent updating it. all the loads and stores. especially in parallel.
anything that requires iterating over dead objects to figure out which ones need to be finalized.

MithrandirMiles commented 9 years ago

@JeffBezanson I understand those points... that's why I was trying to think of crazy ideas, that could make @StefanKarpinski 's mixed ref count idea work... I hadn't had to worry so much about issues of parallelism with ref counts, we used a process model (up to 64K processes, all accessing shared memory for DB operations), but all of the compiler/interpreter's objects were just in each process... "objects", large strings, and top level variables were all ref counted... About the iterating over the dead objects, I think various techniques might help, either segregating refcounted/finalizable objects in different pools, using bitmaps to see what pages had objects that needed to be looked at... but, like I said, I don't (yet!) know enough about the blood and guts of Julia to know what might be doable...

wildart commented 9 years ago

What about making a movable object as in Rust? No referencing is allowed.

vtjnash commented 9 years ago

Adding a linear pass over all garbage is performance suicide, especially since typically a very small number of object needs finalization

there are 2 unused bits in every pointer that the gc-allocates, due to the 16-byte alignment guarantee (assuming we get rid of the smallest pool). it seems like such a waste to not to be using them. but maybe @carnaval has something even better in mind for them then just marking which objects have finalizers?

elextr commented 9 years ago

there are 2 unused bits in every pointer that the gc-allocates, due to the 16-byte alignment guarantee (assuming we get rid of the smallest pool). it seems like such a waste to not to be using them. but maybe @carnaval has something even better in mind for them then just marking which objects have finalizers?

Since using the lower bits for stuff means that each pointer load now needs a mask operation as well, it was found in another project I worked on to have a noticeable effect on performance. Note, that project had a moving collector, so pointer re-loads may have been needed more often than they currently are in Julia.

But to impact a high frequency operation (pointer load) with a low frequency operation (like finalizers) might not be a net gain.

timholy commented 9 years ago

This topic just came up in https://github.com/JuliaLang/julia/pull/11280#issuecomment-104210821. If we use finalizers, it would be nice to use one of our precious tag bits to indicate whether an object's finalizer has already been run, so that all finalizer functions become safe to call twice.

StefanKarpinski commented 9 years ago

Another thing that might help here is the ability to add hooks for resource acquisition to trigger gc – e.g. being able to trigger a gc sweep when too many file handles are open.

elextr commented 9 years ago

Would need to force a full GC, which kind of defeats all the great effort put into the generational one.

quinnj commented 9 years ago

@JeffBezanson, so the usage of your proposal would be

type A
  # fields
end

close(a::A) = # closing stuff...

a = A()

# `with` block that guarantees finalization of object
with a
  # shenanigans
end

# which basically translates to
try
  # shenanigans with `a`
finally
    close(a)
end

What are the other differences with with vs. the try-finally block? Special-casing in GC? Would we still need machinery to flag whether a finalizer has been run or not (since each type will have a defined close method that could potentially be called manually)?

Thinking out loud, and having started reworking some finalizer work in SQLite recently, it would be nice to have the following pattern:

# create immutable Statement type where you can't tamper with the SQLite-provided handle
# also add the finalizer on creation, without an explicit `close` method to call (i.e. the only way to free is by finalization)
immutable SQLiteStatement
  handle::Ptr{Void}
  function SQLiteStatement(db::SQLiteDB)
    stmt = new(slite3_new_statement(db.handle))
    finalizer(stmt,()->sqlite3_finalize(stmt.handle))
    return stmt
  end
end

function do_stuff_with_statements()
  with SQLiteStatement(db) do stmt
    # do stuff with the `stmt`
  end
  # when the end of the `with` block is reached, the registered finalizer is run without having to do a full gc()
end

I think (perhaps wrongly) that's actually not too far from where we are now, but the key here is being able to ensure a finalizer is run once the object goes out of scope, with the necessity of not having to stop the world with a full gc().

JeffBezanson commented 9 years ago

Yes, the ideal situation seems to be freeing things eagerly using with, but with a safety net in the GC to ensure eventual finalization if somebody messes up. We have a finalize function now that does this. The only catch is performance:

Adding the finalizers, and later having the GC go through them, is expensive.
finalize(x) is currently O(n) in the number of registered finalizers. In any case it will be slower than a close(x) call (which could be inlined).

quinnj commented 9 years ago

How about this then for a somewhat more concrete proposal, (bringing a few different ideas together from this thread, #1037, this comment, and #3067):

type A
  # fields
  # declare finalize method within type declaration
  finalize(a::A) = # what to do when finalized
end

# can manually create and finalize A()
a = A()
# do stuff with `a`
finalize(a) # doesn't "stop the world" for a full GC sweep, just calls finalize(`a`)

# use a `with` block for auto-finalization
# similar to `let-block` notation
with a = A(), b = A(), c = A()
  # do stuff with `a`, `b`, and `c`
end
# at end of block, finalize(a), finalize(b), and finalize(c) are called in order

There are a few things going on here:

Declare the finalize method within the type declaration; this addresses #1037 by making finalize sticky to the type itself; this may also allow some optimizations when no finalize method is declared (since we'll know we'll never have to finalize that type)
In place of the current finalizer(x, func), finalize(x), and custom close methods, we have a single, unified finalize method; a nice, simple interface
Generally, this allows a separation of finalize from gc() since finalize(x) would avoid a full sweep
Allows for flexibility in finalize usage (either by manually calling finalize(x), or relying on a with block

From @timholy's comment, we could also use a spare type bit to indicate if a finalizer has run on an object. This allows 1) avoiding calling finalize twice on potentially non-idempotent finalize methods and 2) ensuring finalize is called even if an object was created manually and not finalized manually.

JeffBezanson commented 9 years ago

To clarify, finalize does not do a full sweep, it's just not particularly efficient.

There are definitely some good points to that proposal. We could allocate finalizable objects in a separate arena, which would be annoying but more efficient than what we do now. Those object pools could have a bit array of "finalized" flags, to avoid using up a tag bit per object.

The only downside I see is that you can't do tricks like WeakKeyDict, which attaches finalizers to keys that remove them from the dictionary. That might be ok though, as that technique is not very efficient and should be replaced.

StefanKarpinski commented 9 years ago

When you say "tricks like WeakKeyDict", are there any other examples of using that trick that you're aware of? WeakKeyDict is a fairly special thing that it seems sensible to just support without needing to be able to implement it in the language.

JeffBezanson commented 9 years ago

The mmap code also attaches finalizers to Arrays to call munmap. That's about it I think.

aviks commented 9 years ago

In JavaCall, the object is created by the library, and returned to user code. The creator of the object does not control the lifecycle of the object... it necessarily escapes the creator scope. Upon creation, a finaliser is attached to the object that deallocates the JVM memory when the object is no longer required in Julia.

If I understand @quinnj 's proposal to be that finaliser will no longer be called by the gc, then user code will be made responsible for managing the lifecycle of every object that it retrieves from the JavaCall boundary. That seems quite nasty to me.

JeffBezanson commented 9 years ago

No, it's just that you could efficiently bound the lifetime of something using with. If you don't use with, the GC will still clean up.

Is it sufficient for JavaCall to have finalizers associated with types, and not individual instances, e.g. a JavaObject type?

ScottPJones commented 9 years ago

:+1: to @quinnj 's proposal, it has what I had been asking for, the only thing is that I'd suggest calling the name finalize!, because it definitely modifies the object, and to avoid confusion with the old finalize method.

aviks commented 9 years ago

Ah, ok, thanks... I misunderstood.

Yes, it should be sufficient to have finalisers associated with types. Currently, every object gets the same finaliser function. Of course, the type parameters and fields will need to be available to the finaliser.

quinnj commented 9 years ago

I wonder if the mmap and WeakKeyDict cases call for something like

finalize(a) do f
   # code to finalize `a` which is a type not declared with a `finalize` method
end

This wouldn't actually do the finalizing, just "move" a to the finalize pool of objects and the function argument would be run as the finalize method whenever that happens, either manually, from a with block, or when the object was destroyed.

Not sure how feasible "moving an object to the finalization pool of objects" would be though....

amitmurthy commented 9 years ago

Is https://github.com/JuliaLang/julia/issues/10960 then an artifact of the new gc? That could explain memory leaks with shared and distributed arrays. An ability to explicitly "free" remote objects will be quite useful, especially in cases where people are using distributed arrays across multiple hosts specifically to leverage every bit of memory available.

ScottPJones commented 9 years ago

@quinnj Carrying the discussion from #11280 over here, as requested... You said:

the problem with being able to call your own finalize! is you then need someway to tell if an object has been finalized or not.

That's precisely what I said I'd done, I have a pointer to something that needs to be finalized, so I simply set it to zero (C_NULL) in finalize!. If you don't have a pointer, a flag can be used. It is an extra check on each reference, but you stop having segfaults or problems with things outside of Julia being released. It was the only way I could think of currently to make sure something can be finalized quickly most of the time, and still prevent memory (or other resource) leakage when things get GCed. Do you have any better suggestions to handle that?

amitmurthy commented 9 years ago

Just noticed this when there are multiple finalizers defined for an object.

julia> type Foo
           v
       end

julia> f=Foo(0)
Foo(0)

julia> Foo(0)
Foo(0)

julia> for i in 1:10
           finalizer(f, x-> @schedule print("FINALIZED $i \n") )
       end

julia> f=nothing

julia> for i in 1:10
           print("calling gc for the $i th time\n");
           gc()
       end
calling gc for the 1 th time
calling gc for the 2 th time
FINALIZED 10 
FINALIZED 9 
calling gc for the 3 th time
FINALIZED 1 
FINALIZED 2 
calling gc for the 4 th time
FINALIZED 8 
calling gc for the 5 th time
FINALIZED 7 
calling gc for the 6 th time
FINALIZED 6 
calling gc for the 7 th time
FINALIZED 5 
calling gc for the 8 th time
FINALIZED 4 
calling gc for the 9 th time
FINALIZED 3 
calling gc for the 10 th time

Found it a little odd that all the finalizers are not executed together at the first gc itself.

ScottPJones commented 9 years ago

This wouldn't happen, if @timholy's idea (seconded by @quinnj [and myself]) to use a tag bit to say whether the finalizer had been run for an object. (or I guess that is a different object each time... never mind!)

yuyichao commented 9 years ago

@amitmurthy This is somewhat related to the (sub-)issue I noticed in https://github.com/JuliaLang/julia/issues/11814#issuecomment-114648637 . My guess is that running too many finalizers at the same time will cause a too long pulse but @carnaval should know for sure.

yuyichao commented 8 years ago

FWIW, the issue above in https://github.com/JuliaLang/julia/issues/11207#issuecomment-114620448 is solved by https://github.com/JuliaLang/julia/pull/13995 .

amitmurthy commented 8 years ago

Cool. And regarding the topic of this issue - It is not just about files, I don't think we have a choice but to use finalizers for remote references. We can document that users can manually call finalize for better control on when remote resources get released, else it will only happen when gc eventually gets around to it.

stevengj commented 3 years ago

Isn't this issue essentially a duplicate of #7721?

JuliaLang / julia

stop using finalizers for resource management? #11207