crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.33k stars 1.61k forks source link

Consider the immix GC #5271

Open pnloyd opened 6 years ago

pnloyd commented 6 years ago

I came across a GC called immix that is very new and wanted to bring it to the attention of the Crystal community. immix was written by contributors of the scala-native project, which aims to compile Scala to native code via llvm as a backend, like Crystal. Also like Crystal, they originally opted to use the boehm GC during their initial implementation, but made plans to explore other options.

Why should the Crystal community pay attention to this GC? Look at these benchmarks. image

It appears as though for smallish heap sizes the immix algorithm comes very close to the performance of not having a GC running at all. Which seems amazing to me (although I don't know much about the subject).

The scala-native devs say immix is ahead of the curve performance wise in comparison to the majority of GC's which are based on research from the 90's and early 2000's (immix was 2008). They still are using boehm by default but plan to switch to immix soon as it gains some maturity.

You can see the source for immix here: https://github.com/scala-native/scala-native/tree/master/nativelib/src/main/resources/gc/immix

And in these two files you can see that they define an identical interface for both boehm and immix, that is not to far off from the APIs Crystal uses from boehm. https://github.com/scala-native/scala-native/blob/master/nativelib/src/main/resources/gc/immix/ImmixGC.c https://github.com/scala-native/scala-native/blob/master/nativelib/src/main/resources/gc/boehm/gc.c

Here's the point in the relevant talk where the scala-native dev talks about immix and discuses some it's performance characteristics. https://youtu.be/JZCeQi_xLok?t=14m1s

Papierkorb commented 6 years ago
  1. Can it also be used by C/C++ code? As in, how to integrate it with linked-in libraries?
  2. Does it require full type data? The talker says it's precise, so I guess yes.

The nice part about Boehm is that it's C/++ native already, in that it is conservative and can be used by wrapper libraries to integrate existing C/++ libraries into "Crystals GC". This in effect eliminates almost all ownership issues, which is really nice.

Whatever the GC does, it should support something (A special API is fine too) that can do this.

ysbaddaden commented 6 years ago

Looks nice, but it's bundled with the scala-native project, and will probably be optimized for the needs of Scala. Do they intend to extract it, and make it a language generic implementation?

pnloyd commented 6 years ago

@Papierkorb, my knowledge of compiler design and issues of lowish level compilation issues is shallow. But I know a central goal of the scala-native project is to perform very seamless C interop (probably rivaling or superior to Crystal's) @ysbaddaden, considering they include the source in there root project I'd doubt it. But the scala license is a simple BSD license. And considering the API's are so similar, It would seem foolish not take the few hours it might take to hack Crystal to use it and see if it doesn't work "plug and play". I honestly looked into trying this myself but not only is my C knowledge shallow but Crystal is not building on Ubuntu 17.10. :(

But for anyone who was willing to try this.. I only briefly looked over Crystal's boehm usage and it looks like some parts (setting "bottom stack pointer" i think, letting boehm proxy pthread calls) are just requirements of boehm that immix doesn't require. Maybe it's possible interfacing with immix is simpler.

My underlying point is that it might be so simple that it could be worth trying before diving into the "is it legal, is it optimized for Crystal", like questions

pnloyd commented 6 years ago

@Papierkorb, also I think intuition would tell you that if (i assume so) scala-native can seamlessly switch between boehm and immix, Crystal would be able to as well. Edit: further discussion revealed that intuition was wrong.

pnloyd commented 6 years ago

@ysbaddaden, another thing to consider, scala-native is also in a "implementing multi-threaded" phase like Crystal. I suggest that Crystal just explores the viability of immix, and then wait's for the scala-native dev's to actually bring it into a production ready state.

ysbaddaden commented 6 years ago

Few hours will quickly turn into days and weeks.

For example, we need create a stack for each fiber. How can we declare theses stacks to ImmixGC so it can scan them? If we can't (and I didn't find it with a quick search), then allocating memory in spawn will be thought as inaccessible and freed :boom:

pnloyd commented 6 years ago

@ysbaddaden, ya I agree that that getting another GC really working could take weeks worth of effort. But you could still try it without green threads. And if you you did, and did confirm that the performance improvements were really what the scala-native bench marks suggests they are, then wouldn't it be worth it to take those weeks worth of effort to get it to work for real? -- I might be being naive about the amount of effort to get it to work without green threads of course. Another note, about it being optimized for the needs of Scala. The authors state multiple times that it is basically the exact same algorithm that is described in the research paper it is based off of, which had nothing to do with scala as far as I know.

bararchy commented 6 years ago

Wouldn't that time be better spent making a Crystal native GC so we could have less dependencies and also it would fit our needs 100% ?

pnloyd commented 6 years ago

@bararchy, by native GC do you mean written in Crystal? Crystal uses llvm to compile to actual native code, pure Crystal would be a "pipe dream". -- I think the primary concern for any project should always be the users of the project, within practical reason.

bararchy commented 6 years ago

@pnloyd all depends on priorities I suppose , do people want multi-threading / windows support , or a "free" 20% speed increase , and "free" because as @ysbaddaden said, it might take a not so low amount of time to add

Papierkorb commented 6 years ago

by native GC do you mean written in Crystal? Crystal uses llvm to compile to actual native code, pure Crystal would be a "pipe dream".

The opposite is in fact true, you can implement a full GC in pure Crystal just fine. No problem apart from writing the (complex) code.

Papierkorb commented 6 years ago

Found the mentioned paper: http://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf

Don't have time reading it now though..

pnloyd commented 6 years ago

@Papierkorb, I didn't state you couldn't write a GC in crystal. I was insinuating that having the entire Crystal stack be in purely Crystal is near impossible / impractical. So the fact the immix is written in C shouldn't be of concern, just like the fact the boehm is written in C isn't a concern now, or llvm.

pnloyd commented 6 years ago

@bararchy, even if it wouldn't translate over to windows for what ever reason, the API's are so similar you could easily add some kind of switch between them. Giving up a 50% increase in performance or more on POSIX platforms makes no scene just because you can't have the same on windows.

RX14 commented 6 years ago

There's no point talking hypotheticals with no information, I'm sure we could contact some immix devs and see what their opinion is.

ysbaddaden commented 6 years ago

Having a fast GC that can't support fibers will only lead use so far. We wouldn't be able to bench HTTP::Server for example, or a real world Kemal app. Walking a single stack is one thing, walking hundreds makes a huge difference.

Writing a GC in Crystal is... just as hard/easy (pick one) as writing it in C: it's exactly the same. Language doesn't matter... Expect that you can know about allocated objects (you can know types, class and struct definitions) and thus can write a precise GC instead of a conservative one (that blindly walks allocations and believes everything may be a pointer).

An advantage of ImmixGC is that the algorithm looks quite efficient... or maybe it's mostly because it's a precise GC (as stated in the video), and tailored for Scala.

I support @bararchy that we should concentrate on writing a custom, precise GC, tailored for Crystal. It's not that hard to get something working. It's hard to get something robust and efficient for many different scenarios, though. Immix (paper and scala-native implementation) may a good help to get started.

pnloyd commented 6 years ago

@RX14, scala-native only has a few major contributors. @LukasKellenberger did the entire immix implementation with the guidance of @densh (scale-native primary dev)

densh commented 6 years ago

Hey guys, at the moment our GC implementation is very much hardcoded against Scala Native internals. It's not impossible to decouple it as a separate library and share between multiple projects if there is an interest in that kind of thing. Given the poor state of open source GCs on LLVM, it would be great if we could share efforts and work together towards having at least one high-quality open-source GC implementation.

pnloyd commented 6 years ago

@ysbaddaden, a pure Crystal implementation would be cool, but seeing as @densh has reached out with with a willingness to share their work..

bararchy commented 6 years ago

Can you actually create a robust and precise GC that would cater multiple languages ?

RX14 commented 6 years ago

@bararchy yes, IIRC LLVM has support for generating precise GC information in a number of formats (not for free, the crystal compiler would need a fair bit of work on how to tell LLVM this). You could implement a GC which used one of those formats.

I disagree on the need for a pure-crystal GC any time before 1.0. It's very low down the priority list for me. There's little point spending lots of time on performance optimizations: we're generally considered "fast enough", and we have lots more more important feature work to do.

densh commented 6 years ago

@bararchy The only major thing we need to agree on is the layout of the per-object metadata necessary for precise heap collection (stack and registers are conservative.) Currently our RTTI stores an offset array that lists all GC-able fields in the given object. Additionally we treat arrays specially due to their variable-size nature.

bollu commented 6 years ago

I'd be interested in having an open-source, high quality GC available for all LLVM consumers. I'm implementing simplexhc, a haskell-to-llvm compiler, so I'll need a GC at some point as well.

I'd definitely help with the effort of ripping out the scala-native GC into a reusable library.

pnloyd commented 6 years ago

@densh, I think that with scala-native you guys have arrived on some really amazing ideas that will advance that state of "high performance high level language programming". But maybe you have just discovered another area where you can advance llvm languages as a whole. Thank you for looking into this.

RX14 commented 6 years ago

I'd gladly accept any (good) PRs that add a gc/immix.

bararchy commented 6 years ago

Cool ! Will be interesting to follow on that, if there is such traction and so many interesting parties maybe a standalone project is indeed a good idea

bollu commented 6 years ago

@densh, @pnloyd, @bararchy: do we wish to create a separate empty repo where start pulling out code? How do we organize this? :)

densh commented 6 years ago

Creating a new repo for this is a great starting point. Would github.com/scala-native/immix work for everyone?

EDIT: New repo is now at https://github.com/scala-native/immix now.

densh commented 6 years ago

@ysbaddaden

An advantage of ImmixGC is that the algorithm looks quite efficient... or maybe it's mostly because it's a precise GC (as stated in the video), and tailored for Scala.

We contrast our immix implementation with precise mark-and-sweep on the same benchmarks in our talk from scaladays. Here is MS (with @LukasKellenberger on the right):

screen shot 2017-11-10 at 14 29 53

Compared to Immix (me on the right):

screen shot 2017-11-10 at 14 30 42

As you can see it's not just wins due to precise heap. Immix is way more efficient, especially on GC-heavy benchmarks.

konovod commented 6 years ago

btw Rubinius also uses Immix GC. Some time ago I've tried to check how can it be applied in Crystal but didn't get far.

ghost commented 6 years ago

I guess a separate project/repository for sharing between different projects would be great, if possible.

More leverage power too when multiple projects and multiple eyes look at things.

ylluminate commented 6 years ago

There has been some discussion about GC problems with Amber + websockets. While I don't know the details or where the problem actually lies with the GC here, there was some discussion of perhaps the current GC not being aggressive enough where GC doesn't clear all of the memory if sockets are stored in a data structure.

This discussion led me to think (perhaps erroneously) that perhaps if there are some deep seated problems with the GC causing issues, perhaps Immix could help resolve this. I'm doubtful that this issue goes this deep, but it still crossed my mind and I thought I'd mention it.

I agree with previous remarks that having multi-threading and even Windows support (even the guys at nanobox mentioned how they would consider using crystal for a rewrite due to their fondness of Ruby instead of Go if there was native Windows) are higher priorities right now for the project holistically, but I also agree with @RX14 about getting real tangible input and I'm grateful that @densh chimed in and started the effort on splitting it off for reuse. This is definitely the right move forward here.

pnloyd commented 6 years ago

@ysbaddaden, I had another thought in the "should Crystal's GC be written in Crystal or C debate". Part of the scala-native project involves compiling to yet another IR between scala and the llvm IR. The reason for this? @densh suggests that llvm has biases towards optimizing for C/C++ like code, which makes a lot of sense. So their solution is to have a second IR that they can then "pre optimize" to generate llvm IR that is more like what clang might generate.

The implication of this is not a fun thought to entertain, that Crystal might not be able to generate llvm IR that can reap the full benefits of the llvm optimizations. But this does open up an argument that as a whole, all Crystal programs may be able to achieve their ultimate performance if the GC is written in C, seeing as Crystal may have a fundamental barrier to achieving optimal code generation via llvm.

Considering a GC is something you have running no matter what in every program, it would seem wise not to compromise on it's performance.

akzhan commented 6 years ago

I's not hard to replace good-known boehm with immix, really.

But please show me webapp reasons and I'll do.

ysbaddaden commented 6 years ago

Please let's keep focused: this issue is about ImmixGC.

Thank you @densh! Precise GC is the largest optimization, but Immix closes the gap even more. I looked at the Immix paper, and it looks very clever. The paper lacks details, but they can be found in the ImmixGC implementation.

Some blocking issues to try ImmixGC:

And some features we'll eventually need:

Overall, I think adapting ImmixGC from Scala Native to be a generic GC is a lovely idea, but far from trivial. Different languages have different needs.

In the meantime, I'm learning a lot!

BTW: I found Immix RC that applies Immix to reference counting, to close the gap between Tracing and Reference Counting garbage collectors.

funny-falcon commented 6 years ago

Don't forget: Scala (being Java's spin-off) has no interior pointers (ie, that points into middle of allocation).

If Crystal were just "compilable Ruby" then it had that property too.

But Crystal has real Pointer, and pointerof(@some_property). So, it will be much harder to find start of allocation (than for Scala) (Boehm searches start of allocation), and it will complicate compaction a lot (Scala's immix does compaction?).

So, even without fibers, it will be quite complex project.

pnloyd commented 6 years ago

@funny-falcon, Don't assume that's true. Scala-native doesn't run on the JVM and was designed to have some extremely seamless C interop, it's a different animal than vanilla on JVM scala. pointerof(@some_property) equivalent might or likely exists in scala-native. -- I'd investigate this but scala-native has little to no documentation for the time being. @densh, if you could look at @funny-falcon questions you could probably quickly shed light on the particular concern.

LukasKellenberger commented 6 years ago

Hey guys, our implementation of Immix handles inner pointers only for conservative roots (stack and registers). It does not support it for object fields. And we don't do any compaction yet.

yxhuvud commented 6 years ago

@ysbaddaden I also found a conservative variant of Reference Counting Immix.

But in an case, in Go 1.3 implicit pointers (ie defacto pointers that didn't have the pointer type) was forbidden in the move to make their collector into a precise collector. I guess that is what will need to happen here too.

shayneoAtNorwood commented 6 years ago

Just gonna leave this here as a reference. The algorithm seems relatively straight forward. http://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf

faustinoaq commented 6 years ago

Hi, related project here 👇 https://github.com/ysbaddaden/gc

jots commented 6 years ago

inko-lang also uses Immix: https://inko-lang.org/faq/#header-how-does-immix-work

ffwff commented 4 years ago

I'm not sure if this is the right place to put it but boehmgc supports allocation with precise type information through functions specified in gc_typed.h. I am working on a compiler patch which exposes pointer offset information in this branch, which will hopefully be integrated to the gc.

bcardiff commented 4 years ago

@ffwff it seems you are dealing with the nuances of the mixed union types :-)

Something I tried to do in the past was to have a custom mark procedure for arrays in https://github.com/bcardiff/crystal/tree/mt/safe-array and I could not get some answers on how to accomplish it https://github.com/ivmai/bdwgc/issues/290.

Having a custom mark for arrays would be helpful to avoid marking unneeded arrays and to avoid zeroing the memory when removing elements.

Accomplishing the precise type information is good, but I think the array might have a bigger impact. I might be wrong. It's just a hunch.

In case you are aware on how to do the array custom mark algorithm in a successful way I am interested.

ffwff commented 4 years ago

@bcardiff The way I implemented precise marking for the stdlib's array and hashmap in my crystal os was to have them inherit a Markable class and have a mark function which would then be called whenever the GC scans the object. Since the GC is written in crystal, it's trivial to cast that object into a Markable, call the mark function, pass a block where it would then yield pointers internal to the data structure (i.e. objects within the array buffer and the buffer itself), the pointers yielded could then be marked.

I'm not sure if it breaks compatibility if I implemented it here, but it's something to think about.

Something you could do is override Array.allocate that would allocate a pointer with a specific mark callback implemented in a private Array#mark.

As for performance, from my crude performance benchmark of the brainfuck implementation, user time went from 10.192s to 9.202s, and it's fairly consistent too. I haven't gotten the compiler to build yet (something about not being able to generate precise bitmap for Crystal::GenericType) but this looks promising...

edit: it seems that I implemented precise allocations wrongly, so numbers might be a little different...

yxhuvud commented 4 years ago

While this is super cool, perhaps it would be a good idea to create a new issue just for precise collecting? While this probably a requirement before changing (or implementing) a different GC, it is probably worthwhile in itself without mixing it up with Immix.