Open straight-shoota opened 2 weeks ago
As stated in #14679:
I'd like to avoid introducing calloc(size)
when the C function is calloc(num, size)
. I understand we don't care about the num
arg, but that's introducing a discrepancy over a weighted name, and I'd prefer a self explanatory name for the "memory is set to zero" guarantee (for non C programmers).
I'm all in for a single GC.malloc(size, *, atomic = false, clear = true)
method (and an new __crystal_malloc3(size, atomic, clear)
fun). I mean, atomic
is just a flag in my immix GC and it never sets memory to zero because crystal always does (when it matters).
Anyway, having a mean to say "allocate and set to zero" and "allocate and don't set to zero" is needed for Crystal to tell it's intent to the allocator's interface, since each allocator has different guarantees:
calloc
);GC_malloc
always zeroes, GC_malloc_atomic
never;If we bypass the CRT and use Win32's HeapAlloc
then there is also an optional argument specifying whether the memory should be cleared or not. On Windows LibC.malloc
is only used in -Dgc_none
and in the wmain
entry point though
I'm all in for a single GC.malloc(size, *, atomic = false, clear = true) method (and an new __crystal_malloc3(size, atomic, clear) fun)
We can't really make it just one function, because that will break backwards compatibility. But yes, we can have this generic version and keep the "atomic" version deprecated.
@beta-ziliani Backwards compatibility is trivial. The compiler knows which functions (or methods) are defined and can make calls appropriately.
I'm all in for a single GC.malloc(size, *, atomic = false, clear = true) method (and an new __crystal_malloc3(size, atomic, clear) fun).
Ideally, as a follow-up PR to #14679, I wanted to further optimize the allocation methods using LLVM's allockind
function attribute.
Using this attribute, we can for example tell LLVM that the memory returned from an allocation function is always zeroed out - which allows LLVM to optimize most default ivar values away (@x = nil
, @y = 0
).
If we only have one allocation function with lots of parameters, we can't fully specify the allocation behaviour to LLVM and can't fully optimize the code.
I'd like to avoid introducing
calloc(size)
when the C function iscalloc(num, size)
.
We could also make the default malloc
methods always clean their memory and add new malloc_dirty
methods (or something similar).
@BlobCodes oh, allockind
looks very nice :+1:
I investigated memory allocations in codegen, and there are only a few calls:
Codegen#malloc
is called by Codegen#malloc_closure
and Codegen#malloc_aggregate
.
Codegen#malloc_atomic
is only called from Codegen#allocate_aggregate
.
Codegen#array_malloc
and Codegen#array_malloc_atomic
are only used to implement Pointer.malloc
.
Of these use cases:
Codegen#malloc_closure
: I don't know if it needs the memory to be initialized to zero (it currently does or doesn't because of #14678) => I believe this is part of this issue to determine this;
Codegen#allocate_aggregate
: it needs the memory to be initialized to zero (and may do it twice => :bug:).
Pointer(T).malloc
: the memory must be initialized to zero (whatever if T
has inner pointers)... and here comes another bug: (nope, not a bug).#malloc_atomic
currently returns uninitialized memory => :bug:
So I don't think we have a need to distinguish a "please zero memory" vs "don't zero memory". Having the four Codegen#[array_]malloc[_atomic]
methods always return initialized memory feels like it would be enough.
Then a check in #malloc_aggregate
to avoid zeroing twice to fix this issue.
and there's a bug since #malloc_atomic currently returns uninitialized memory!
Then a check in #malloc_aggregate to avoid zeroing twice to fix this issue.
That would fix #14677, but not this issue. This issue is about completely removing the zeroing in situations where all memory is overwritten anyway.
Also, we can't just add a check in malloc_aggregate
and array_malloc
to avoid the unnecessary zeroing since old stdlibs (<= 1.12.2) have an invalid GC.malloc
implementation which doesn't clear the memory when compiling with -Dgc_none
.
This is also the reason I chose to add completely separate internal allocation functions which actually guarantee the memory is cleared, manually clearing it in codegen if unavailable.
Urgh, sorry. I mixed up the issues. I'll move my comment :bow:
Note: I fail to see the older stdlib problem: a 1.12 or older compiler using the stdlib from 1.13+ (for example to compile Crystal 1.13 itself) will use the __crystal_malloc64
fun and GC.malloc
def from the 1.13+ stdlib that would now zero the memory. The worst to happen is the same as introducing new funs: the first time we compile Crystal 1.13 using Crystal 1.12, the compiler itself will keep zeroing the memory twice (but not the programs it compiles), and the compiler should recompile itself.
It would be an issue if using a 1.13 compiler with an older stdlib, but I would be very surprised if we supported that. We'd have the same issue with new funs that would end up calling the fallback LibC functions instead of GC methods :fearful: Definitely not supported.
It would be an issue if using a 1.13 compiler with an older stdlib, but I would be very surprised if we supported that. We'd have the same issue with new funs that would end up calling the fallback LibC functions instead of GC methods :fearful: Definitely not supported.
Yes, this is indeed a relevant use case. I suppose it's mostly for compiler/stdlib development. You can go back and build old versions with the current compiler. Of course it's not super important to ensure production-grade stability. But still worth considering. Currently we can go back and build Crystal 1.8 with a 1.12 compiler. Older versions fail due to incorrectly mangled intrinsics after https://github.com/crystal-lang/crystal/pull/13164
The default malloc implementation
GC.malloc
is supposed to zero the allocated memory. This is a sane default choice because it ensures a clean starting field. The GC won't risk incorrectly assuming uninitialized memory to contain live pointers.But zeroing out memory has a runtime cost. And it's often useless because the memory gets fully initialized and all zeroed bits are overridden right away (e.g: https://github.com/crystal-lang/crystal/pull/14671#discussion_r1632103195).
Performance could be improved if there was a way to avoid zeroing when not necessary.
A potential implementation is proposed in #14679