crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.2k stars 1.61k forks source link

Allocation of non-zeroed memory for performance boost #14687

Open straight-shoota opened 2 weeks ago

straight-shoota commented 2 weeks ago

The default malloc implementation GC.malloc is supposed to zero the allocated memory. This is a sane default choice because it ensures a clean starting field. The GC won't risk incorrectly assuming uninitialized memory to contain live pointers.

But zeroing out memory has a runtime cost. And it's often useless because the memory gets fully initialized and all zeroed bits are overridden right away (e.g: https://github.com/crystal-lang/crystal/pull/14671#discussion_r1632103195).

Performance could be improved if there was a way to avoid zeroing when not necessary.

A potential implementation is proposed in #14679

ysbaddaden commented 2 weeks ago

As stated in #14679:

I'd like to avoid introducing calloc(size) when the C function is calloc(num, size). I understand we don't care about the num arg, but that's introducing a discrepancy over a weighted name, and I'd prefer a self explanatory name for the "memory is set to zero" guarantee (for non C programmers).

I'm all in for a single GC.malloc(size, *, atomic = false, clear = true) method (and an new __crystal_malloc3(size, atomic, clear) fun). I mean, atomic is just a flag in my immix GC and it never sets memory to zero because crystal always does (when it matters).

ysbaddaden commented 2 weeks ago

Anyway, having a mean to say "allocate and set to zero" and "allocate and don't set to zero" is needed for Crystal to tell it's intent to the allocator's interface, since each allocator has different guarantees:

HertzDevil commented 2 weeks ago

If we bypass the CRT and use Win32's HeapAlloc then there is also an optional argument specifying whether the memory should be cleared or not. On Windows LibC.malloc is only used in -Dgc_none and in the wmain entry point though

beta-ziliani commented 2 weeks ago

I'm all in for a single GC.malloc(size, *, atomic = false, clear = true) method (and an new __crystal_malloc3(size, atomic, clear) fun)

We can't really make it just one function, because that will break backwards compatibility. But yes, we can have this generic version and keep the "atomic" version deprecated.

straight-shoota commented 2 weeks ago

@beta-ziliani Backwards compatibility is trivial. The compiler knows which functions (or methods) are defined and can make calls appropriately.

BlobCodes commented 2 weeks ago

I'm all in for a single GC.malloc(size, *, atomic = false, clear = true) method (and an new __crystal_malloc3(size, atomic, clear) fun).

Ideally, as a follow-up PR to #14679, I wanted to further optimize the allocation methods using LLVM's allockind function attribute. Using this attribute, we can for example tell LLVM that the memory returned from an allocation function is always zeroed out - which allows LLVM to optimize most default ivar values away (@x = nil, @y = 0).

If we only have one allocation function with lots of parameters, we can't fully specify the allocation behaviour to LLVM and can't fully optimize the code.


I'd like to avoid introducing calloc(size) when the C function is calloc(num, size).

We could also make the default malloc methods always clean their memory and add new malloc_dirty methods (or something similar).

ysbaddaden commented 2 weeks ago

@BlobCodes oh, allockind looks very nice :+1:

ysbaddaden commented 2 weeks ago

I investigated memory allocations in codegen, and there are only a few calls:

Of these use cases:

So I don't think we have a need to distinguish a "please zero memory" vs "don't zero memory". Having the four Codegen#[array_]malloc[_atomic] methods always return initialized memory feels like it would be enough.

Then a check in #malloc_aggregate to avoid zeroing twice to fix this issue.

BlobCodes commented 2 weeks ago

and there's a bug since #malloc_atomic currently returns uninitialized memory!

https://github.com/crystal-lang/crystal/blob/835b0c7ec5945c48ddd65f856cd81338ccbe7f65/src/compiler/crystal/codegen/codegen.cr#L2174

Then a check in #malloc_aggregate to avoid zeroing twice to fix this issue.

That would fix #14677, but not this issue. This issue is about completely removing the zeroing in situations where all memory is overwritten anyway.

Also, we can't just add a check in malloc_aggregate and array_malloc to avoid the unnecessary zeroing since old stdlibs (<= 1.12.2) have an invalid GC.malloc implementation which doesn't clear the memory when compiling with -Dgc_none. This is also the reason I chose to add completely separate internal allocation functions which actually guarantee the memory is cleared, manually clearing it in codegen if unavailable.

ysbaddaden commented 2 weeks ago

Urgh, sorry. I mixed up the issues. I'll move my comment :bow:

Note: I fail to see the older stdlib problem: a 1.12 or older compiler using the stdlib from 1.13+ (for example to compile Crystal 1.13 itself) will use the __crystal_malloc64 fun and GC.malloc def from the 1.13+ stdlib that would now zero the memory. The worst to happen is the same as introducing new funs: the first time we compile Crystal 1.13 using Crystal 1.12, the compiler itself will keep zeroing the memory twice (but not the programs it compiles), and the compiler should recompile itself.

It would be an issue if using a 1.13 compiler with an older stdlib, but I would be very surprised if we supported that. We'd have the same issue with new funs that would end up calling the fallback LibC functions instead of GC methods :fearful: Definitely not supported.

straight-shoota commented 1 week ago

It would be an issue if using a 1.13 compiler with an older stdlib, but I would be very surprised if we supported that. We'd have the same issue with new funs that would end up calling the fallback LibC functions instead of GC methods :fearful: Definitely not supported.

Yes, this is indeed a relevant use case. I suppose it's mostly for compiler/stdlib development. You can go back and build old versions with the current compiler. Of course it's not super important to ensure production-grade stability. But still worth considering. Currently we can go back and build Crystal 1.8 with a 1.12 compiler. Older versions fail due to incorrectly mangled intrinsics after https://github.com/crystal-lang/crystal/pull/13164