JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.45k forks source link

Better GMP integration with the GC #11202

Open carnaval opened 9 years ago

carnaval commented 9 years ago

We discussed this a few times so better file an issue in case anyone wants to go ahead.

For now our GMP integration allocates the dynamic memory holding the integer using the libc allocator. It then adds a finalizer for cleanup when the parent object gets GCd.

This is inefficient, mainly because finalizers themselves are, and because the GC can't know of much memory will be freed by a collection before running the finalizers, which makes some heuristics bad. The best I can come up with for doing this would be to allow julia datatypes to have fields of type buff_t, like the hidden data pointer in an Array. Those fields points to gc memory which is not tagged but can have variable length. The BigInt type could then declare the _mp_d field (https://gmplib.org/manual/Integer-Internals.html) as a buff_t and allocate it using allocb. This should not require modification inside GMP. It would however need to special case fields of buffer type in the DataType marking code in the gc.

I don't know much about GMP so maybe there is a better solution. Also this would probably work for MPFR and maybe other C libraries we are wrapping using finalizers.

simonster commented 9 years ago

Ref #4176

ScottPJones commented 9 years ago

This sounds like a very interesting issue... esp. since I can see it being an issue with decimal float support (which I very much need)... so is #4176

carnaval commented 9 years ago

@ScottPJones If the C type you are wrapping is fixed size the best way is likely to embed it directly inside a julia DataType, maybe as the struct itself or as a bitstype field. That way you avoid pointers and some gc overhead (especially if the type is immutable).

carnaval commented 9 years ago

(our DataTypes respect the C struct layout so you should be able to copy the definition over)

timholy commented 9 years ago

Also worth linking #10084.

carnaval commented 9 years ago

Oh god. Forgot about this. I need a snooze button for github issues.

StefanKarpinski commented 9 years ago

A related idea would be making BigFloat a parametric type where the number of bits is part of the type, and dictates its representation, so that they could be stack allocated. @JeffBezanson doesn't care for this since it would potentially lead to a ton of code generation, but I suspect that in practice people don't use that many different sizes of BigFloat, so it might actually be ok.

JeffBezanson commented 9 years ago

Maybe we could use a tuple of Limbs instead of a buff_t? But do we need a write barrier when GMP assigns the _mp_d field to the new space?

carnaval commented 9 years ago

Oh of course Tuple{Vararg{UInt8}} is a much better type for the limb vector now ! I like this, no need to touch gc.c. We do need a barrier in any approach of this but we can just call it after every gmp calls it's quite cheap (and no need to patch gmp this way). So the roadmap for this now looks like :

Does that seem right ?

JeffBezanson commented 9 years ago

Sounds right to me. Putting the barriers after every gmp call is kind of annoying; I guess we can wrap it in a macro.

tkelman commented 9 years ago

I like the sound of that because, and correct me if I'm wrong, packages could also plug into this approach without having to special-case certain types in base, right? So BigDecimal or what have you could be worked on independently without needing code changes in the gc.

ScottPJones commented 9 years ago

:+1: @tkelman (just what I need!)

carnaval commented 9 years ago

@tkelman yup. It does need some changes in the C code the first time however.

JeffBezanson commented 9 years ago

An idle thought: we could define

immutable BigInt <: Integer
    data::Tuple{Vararg{Limb}}
    neg::Bool
end

so BigInts are compared correctly by ===. Then when calling GMP, we can stack allocate structs with the right representation. GMP alloc functions can allocate tuples as discussed above. Then to return, we pick the tuple out of the struct and wrap it in a new BigInt.

stevenhao commented 8 years ago

An issue with write barriers: We cannot add a write barrier every time GMP makes a memory call, because jl_gc_wb takes a parent pointer, whereas GMP's memory functions do not pass a parent pointer.

One solution is to put the barrier on the parent before/after every ccall to GMP, but this is dangerous because the GC may run multiple times before the ccall terminates.

Gist (implements using Tuple objects in the GMP memory functions), if anyone wants to pick up off this: https://gist.github.com/stevenhao/5438681b883e36d15e9565c9f51f74e1

vtjnash commented 2 years ago

Yes, they are related. There has not been any activity on it.