crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.47k stars 1.62k forks source link

Initializing reference objects in non-default storage #13481

Open HertzDevil opened 1 year ago

HertzDevil commented 1 year ago

Reference#allocate, i.e. @[Primitive(:allocate)], does a lot of work behind the scenes before the allocated object's #initialize gets called:

# pseudo-code representation of `Crystal::CodeGenVisitor#allocate_aggregate`
class Reference
  def self.allocate : self
    {% if ... %} # type contains any inner pointers
      obj = __crystal_malloc64(instance_sizeof(self).to_u64!).as(self)
    {% else %}
      obj = __crystal_malloc_atomic64(instance_sizeof(self).to_u64!).as(self)
    {% end %}

    Intrinsics.memset(obj.as(Void*), 0_u8, instance_sizeof(self), false)

    {% for ivar in @type.instance_vars %}
      {% if ivar.has_default_value? %}
        pointerof(obj.@{{ ivar }}).value = {{ ivar.default_value }}
      {% end %}
    {% end %}

    set_crystal_type_id(obj)

    obj
  end
end

This implementation ties object creation to a global allocator, which is either LibC or LibGC depending on whether -Dgc_none was specified during compilation. But there is actually nothing wrong with using storage allocated by something else:

# heapapi.cr
lib LibC
  HEAP_ZERO_MEMORY = 0x00000008

  fun GetProcessHeap : HANDLE
  fun HeapAlloc(hHeap : HANDLE, dwFlags : DWORD, dwBytes : SizeT) : Void*
  fun HeapFree(hHeap : HANDLE, dwFlags : DWORD, lpMem : Void*) : BOOL
end

class Foo
  @x : Int32
  @y = "abc"

  def initialize(@x : Int32)
  end

  def self.new(x : Int32, &)
    obj = LibC.HeapAlloc(LibC.GetProcessHeap, LibC::HEAP_ZERO_MEMORY, instance_sizeof(self)).as(self)

    pointerof(obj.@y).value = "abc" # ???

    set_crystal_type_id(obj)

    begin
      obj.initialize(x)
      yield obj
    ensure
      # manual memory management!
      obj.finalize if obj.responds_to?(:finalize)
      LibC.HeapFree(LibC.GetProcessHeap, 0, obj.as(Void*))
    end
  end
end

# note the different addresses
Foo.new(1) { |foo| p foo } # => #<Foo:0x1d58eb9bc70 @x=1, @y="abc">
Foo.new(2) { |foo| p foo } # => #<Foo:0x1d58eb9c230 @x=2, @y="abc">
p Foo.new(3)               # => #<Foo:0x1d590620f60 @x=3, @y="abc">
p Foo.new(4)               # => #<Foo:0x1d590620e40 @x=4, @y="abc">

Or even on the stack, as some people have always dreamed:

class Foo
  @x : Int32
  @y = "abc"

  def initialize(@x : Int32)
  end

  def self.new(x : Int32, &)
    buf = uninitialized UInt8[instance_sizeof(self)] # alignment not guaranteed
    obj = buf.to_unsafe.as(self)

    buf.fill(0)
    pointerof(obj.@y).value = "abc" # ???
    set_crystal_type_id(obj)

    begin
      obj.initialize(x)
      yield obj
    ensure
      obj.finalize if obj.responds_to?(:finalize)
    end
  end
end

Foo.new(1) { |foo| p foo } # => #<Foo:0x100bffacc @x=1, @y="abc">
Foo.new(2) { |foo| p foo } # => #<Foo:0x100bffaa4 @x=2, @y="abc">
p Foo.new(3)               # => #<Foo:0x10bbb840f60 @x=3, @y="abc">
p Foo.new(4)               # => #<Foo:0x10bbb840e40 @x=4, @y="abc">

In these cases we have manually expanded the instance variable initializers and the set_crystal_type_id call, but all of this could have been done by a compiler primitive, because it is really just @[Primitive(:allocate)] without the allocation. So I would like to have a new primitive class method for this, say @[Primitive(:pre_initialize)]:

class Foo
  def self.new(x : Int32, &)
    obj = LibC.HeapAlloc(LibC.GetProcessHeap, 0, instance_sizeof(self)).as(self)
    Foo.pre_initialize(obj)

    begin
      obj.initialize(x)
      yield obj
    ensure
      obj.finalize if obj.responds_to?(:finalize)
      LibC.HeapFree(LibC.GetProcessHeap, 0, obj.as(Void*))
    end
  end

  def self.new(x : Int32, &)
    buf = uninitialized UInt8[instance_sizeof(self)] # alignment not guaranteed
    obj = buf.to_unsafe.as(self)
    Foo.pre_initialize(obj)

    begin
      obj.initialize(x)
      yield obj
    ensure
      obj.finalize if obj.responds_to?(:finalize)
    end
  end
end

Although the discussion is based on reference types, this would also work for value types except that set_crystal_type_id isn't called. (Non-abstract value types are leaf types and therefore do not carry the type ID in their layout. Abstract value types are taken care of via upcasting.)

I believe this will make custom allocators easier to write in Crystal if we ever go by that route, especially when targetting embedded devices.

straight-shoota commented 1 year ago

I suppose an alternative could be to allow passing a block to allocate for custom allocation. I don't think it has any noteworthy benefit, though.

straight-shoota commented 1 year ago

This is similar to #13368 and I figure the proposed .pre_initialize method could also make a good solution for that problem.

In this context I realized that it might be a good idea for .pre_initialize to receive a memory pointer and return self. I think this makes a lot of sense because it means that this method turns a blob of memory into an object.

The usage example would be come this:

  def self.new(x : Int32, &)
    mem = LibC.HeapAlloc(LibC.GetProcessHeap, 0, instance_sizeof(self))
    obj = Foo.pre_initialize(mem)

    begin
      obj.initialize(x)
      yield obj
    ensure
      obj.finalize if obj.responds_to?(:finalize)
      LibC.HeapFree(LibC.GetProcessHeap, 0, mem)
    end
  end

I reckon we might consider a different, better fitting name. Perhaps it could be an overload of .allocate with a parameter for the memory location?

HertzDevil commented 1 year ago

Indeed one of the goals of having .pre_initialize is to remove the manual calls to Object.set_crystal_type_id. Having the method return the object directly might also be useful because the object does not have to be located at the root of the allocated memory, e.g. to satisfy a stronger alignment.

straight-shoota commented 1 year ago

the object does not have to be located at the root of the allocated memory, e.g. to satisfy a stronger alignment.

Not sure I understand this aspect. Do you mean for .pre_initialize to advance the pointer if it's not aligned? That would be problematic with regards to the size of allocated memory.

HertzDevil commented 1 year ago

Well, can you think of any other instance where obj is not equal to mem.as(Foo)?

straight-shoota commented 1 year ago

My point is about including the cast into the method so it happens in the same space as setting the type id, and represent the conversion from blob of memory to object.

straight-shoota commented 1 year ago

But yeah if we want to offer the alignment route, I suppose the pointer needs to be accompanied by a size then? Otherwise the initializer would not know whether there is room for shifting the object? I suppose this would also allow some flexibility for types with variable size.

straight-shoota commented 1 year ago

Thinking this a bit further, I'm wondering if it wouldn't be more practical to pass an allocator object to the initializer. This encapsulates the decision how much memory even needs to be allocated, and separates it from the mechanism of how it's allocated.

In practice, this could just mean that allocate can receive an optional allocator parameter:

module Allocator
  abstract def self.malloc(size : UInt64)
  abstract def self.malloc_atomic(size : UInt64)
end

class Reference
  module DEFAULT_ALLOCATOR # Maybe this could even point directly to `GC`
    extend Allocator
    def self.malloc(size : UInt64)
      __crystal_malloc64(size)
    end
    def self.malloc_atomic(size : UInt64)
      __crystal_malloc_atomic64(size)
    end
  end

  def self.allocate(allocator : Allocator = DEFAULT_ALLOCATOR) : self
    {% if ... %} # type contains any inner pointers
      obj = allocator.malloc(instance_sizeof(self).to_u64!).as(self)
    {% else %}
      obj = allocator.malloc_atomic(instance_sizeof(self).to_u64!).as(self)
    {% end %}

    Intrinsics.memset(obj.as(Void*), 0_u8, instance_sizeof(self), false)

    {% for ivar in @type.instance_vars %}
      {% if ivar.has_default_value? %}
        pointerof(obj.@{{ ivar }}).value = {{ ivar.default_value }}
      {% end %}
    {% end %}

    set_crystal_type_id(obj)

    obj
  end
end
HertzDevil commented 1 year ago

any other instance where obj is not equal to mem.as(Foo)

An extreme example would be Win32 HGLOBALs; some memory is allocated using GlobalAlloc, whereas pre_initialize calls GlobalLock. Then the two pointers need not be identical if the memory was allocated as movable.

But yeah if we want to offer the alignment route, I suppose the pointer needs to be accompanied by a size then? Otherwise the initializer would not know whether there is room for shifting the object?

You could align a pointer using its address alone. The allocation call is responsible for ensuring enough storage is present after alignment, whether the call happens inside .allocate, or outside it like the HeapAlloc example.

that allocate can receive an optional allocator parameter

This has its own merits but I don't think it affects .pre_initialize's signature, unless the instance_sizeof(self) is passed in as an argument, i.e. we want .pre_initialize to also support String and Log::Metadata. I haven't considered that aspect yet.

straight-shoota commented 1 year ago

I think alignment is deeply connected with the actual memory allocation. So that part of the code should take care of that. .pre_initialize should only worry about internal setup of the allocated memory.

ysbaddaden commented 11 months ago

I'm wondering if that could allow to 'embed' classes, so that a class could be inlined right into another class? Just like a struct is inlined for example. Yet still be able to refer to the embedded object as normal (@obj is still a reference that can be passed around).

This is a feature I'd love to have to make sure some data is allocated next to each other (less CPU cache trashing) and help reduce HEAP allocations by making a single allocation instead of 1 + N. The drawback being that if all references go out of scope but one reference to an inner object the one big allocation won't be collected by the GC.

Yet, there are great use cases when the ivar is meant to be internally. For example a Mutex to protect an inner data-structure, or the Deque object inside Channel.

ysbaddaden commented 11 months ago

If we use LibC::SizeT[] instead of UInt8[] then it should be aligned by the ABI?

HertzDevil commented 11 months ago

I'm wondering if that could allow to 'embed' classes

If you can do it on the stack, you can do it in an instance variable:

class Foo
  getter x = 1
  getter y = "abc"
end

class Bar
  def initialize
    @buf = uninitialized UInt8[16]

    # foo = self.foo
    # @buf.fill(0)
    # pointerof(foo.@x).value = 1
    # pointerof(foo.@y).value = "abc"
    # Foo.set_crystal_type_id(foo)
    Foo.pre_initialize(@buf.to_unsafe.as(Void*))
  end

  def foo
    @buf.to_unsafe.as(Foo)
  end
end

Bar.new.foo # => #<Foo:0x25f3c76c384 @x=1, @y="abc">

But as you might notice, the 16 is the hard-coded value of instance_sizeof(Foo); actually using instance_sizeof there either is disallowed or produces an internal error. So this alone might not be enough for that purpose. (If there are no ivar initializers in Foo you wouldn't even need pre_initialize, but the point still stands.)

If we use LibC::SizeT[] instead of UInt8[] then it should be aligned by the ABI?

That merely changes the buffer's alignment to that of LibC::SizeT, not match it to the struct type's natural alignment.

ysbaddaden commented 11 months ago

Nice!

The only drawback is that it ain't transparent: you can't access @buf directly as Foo, you must cast it to the actual type. That would need builtin support by the compiler.

# pointerof(foo.@x).value = 1
# pointerof(foo.@y).value = "abc"
# Foo.set_crystal_type_id(foo)

Any reason those aren't injected right into the #initialize methods?

ysbaddaden commented 11 months ago

And yeah: not being able to use sizeof or a constant and calculations for a generic size is very limiting.

HertzDevil commented 11 months ago

That's precisely the point of this issue; pre_initialize should expose the compiler-generated part of reference object initialization that precedes #initialize. Surely we could move the ivar initializers into Foo#initialize, and only that step:

class Foo
  getter x : Int32
  getter y : String

  def initialize
    @x = 1
    @y = "abc"
  end

  # needed since the compiler makes `#initialize` a protected method
  def __initialize(*args, **opts)
    initialize(*args, **opts)
  end
end

class Bar
  def initialize
    @buf = uninitialized UInt8[16]

    # prevents spurious GC behavior
    @buf.fill(0)

    # `Foo` has no ivar initializers now

    # necessary for Crystal's own ABI for reference types
    Foo.set_crystal_type_id(foo)

    foo.__initialize
  end

  def foo
    @buf.to_unsafe.as(Foo)
  end
end

The compiler does all this pre-initialization already, so there should never be a need to repeat ourselves inside #initialize.

I think perhaps we don't need to make all #initialize methods protected.

HertzDevil commented 11 months ago

I've been playing with the idea of a built-in type that exposes the pointee type of any reference:

# `T` must be a non-union reference type
struct Instance(T)
  # no way to create `Instance`s directly
  private def initialize
  end

  # minimal definitions for `Object` follow
  def ==(other : Instance(U)) : Bool forall U
    pointerof(@type_id).as(T) == pointerof(other.@type_id).as(U)
  end

  def ==(other) : Bool
    false
  end

  def hash(hasher)
    pointerof(@type_id).as(T).hash(hasher)
  end

  def to_s(io : IO) : Nil
    pointerof(@type_id).as(T).to_s(io)
  end

  def inspect(io : IO) : Nil
    pointerof(@type_id).as(T).inspect(io)
  end
end

such that T and Pointer(Instance(T)) are ABI-compatible, and in particular instance_sizeof(T) == sizeof(Instance(T)). The implementation is pretty simple:

module Crystal
  class Program
    def initialize
      # ...
      types["Instance"] = @instance = instance = GenericInstanceStructType.new self, self, "Instance", value, ["T"]
      instance.declare_instance_var("@type_id", int32)
      instance.can_be_stored = false
      # ...
    end
  end

  class GenericInstanceStructType < GenericClassType
    # ...
  end

  class InstanceStructType < GenericClassInstanceType
    getter reference_type : Type
    # ...
  end

  class LLVMTyper
    private def create_llvm_type(type : InstanceStructType, wants_size)
      llvm_struct_type(type.reference_type, wants_size)
    end
  end
end

This then allows you to do things like:

class Bar
  def initialize
    @foo = uninitialized Instance(Foo)

    # foo = Foo.pre_initialize(pointerof(@foo))
    Slice.new(pointerof(@foo), 1).to_unsafe_bytes.fill(0)
    pointerof(foo.@x).value = 1
    pointerof(@foo.@type_id).value = Foo.crystal_instance_type_id

    # call stuffs like `foo.__initialize` if necessary
  end

  def foo
    pointerof(@foo).as(Foo)
  end
end

# notice how `Foo`'s address is precisely that of `Bar`
# plus the size (and padding) of a type ID
Bar.new # => #<Bar:0x104959f60 @foo=#<Foo:0x104959f68 @y=nil, @x=1>>
straight-shoota commented 11 months ago

Related issue: https://github.com/crystal-lang/crystal/issues/7989

HertzDevil commented 11 months ago

We could also define a different auxiliary method here:

class Reference
  def self.unsafe_construct(address : Pointer, *args, **opts) : self
    obj = pre_initialize(address.as(Void*))
    obj.initialize(*args, **opts)
    obj
  end
end

The advantage is #initialize can remain being protected:

class Bar
  def initialize
    @foo = uninitialized Instance(Foo)
    foo = Foo.unsafe_construct(pointerof(@foo))
  end
end

Also .new now becomes equivalent to:

class Foo
  def self.new(*args, **opts)
    buf = {% if ... %} # type contains any inner pointers
            __crystal_malloc64(instance_sizeof(self).to_u64!)
          {% else %}
            __crystal_malloc_atomic64(instance_sizeof(self).to_u64!)
          {% end %}
    obj = unsafe_construct(buf, *args, **opts)
    ::GC.add_finalizer(obj) if obj.responds_to?(:finalizer)
    obj
  end
end

If for whatever reasons Foo's fields need to be uninitialized, its instance variables can simply say that, the same as normal object creation.

HertzDevil commented 11 months ago

Some alternative names for Instance: {Class,Reference,Instance}Storage

straight-shoota commented 11 months ago

Maybe ReferenceAllocation?

yxhuvud commented 11 months ago

The advantage is #initialize doesn't have to be protected:

Isn't it protected mostly to protect newbies from themselves? EDIT: text commented on clarified.

HertzDevil commented 11 months ago

that's a typo, .unsafe_construct is public and #initialize is protected

HertzDevil commented 11 months ago

.pre_initialize would make sense for struct types too. On a non-abstract struct, or an abstract struct that has only one non-abstract subclass, it would zero the memory and then run the inline ivar initializers, without touching any type ID. On an abstract struct with multiple subclasses, the type ID is also written since the struct behaves like a mixed union. In that case it might be important for .pre_initialize not to return a value:

struct Foo
  @x = "abc"
  @y = uninitialized UInt8[4096]
end

foo = uninitialized Foo

# this returns a large object by value! don't allow this
# foo = Foo.pre_initialize(pointerof(foo))

Foo.pre_initialize(pointerof(foo))
foo.@x # => "abc"
x = uninitialized UInt8[4096]
StaticArray(UInt8, 4096).pre_initialize(pointerof(x))

I don't think non-struct Values would get a lot of benefits from this (in fact .allocate breaks on most of them), so I would go for:

class Reference
  @[Primitive(:pre_initialize)]
  def self.pre_initialize(address : Void*) : self
  end

  def self.unsafe_construct(address : Void*, *args, **opts) : self
  end
end

struct Struct
  @[Primitive(:pre_initialize)]
  def self.pre_initialize(address : self*) : Nil
  end

  def self.unsafe_construct(address : self*, *args, **opts) : Nil
  end
end
HertzDevil commented 2 months ago

Struct pre-initialization depends on both the receiver struct type and the argument's pointee type. If we have the following types:

abstract struct Foo
  @x = 123
end

struct Bar < Foo
  @y = 4.5
end

struct Quux < Foo
  @z = "abc"
end

then we'll need at least the following overloads of .pre_initialize:

# exposition only
struct Foo_CODEGEN
  @type_id : Int32
  @data : Void*[2]
end

def Foo.pre_initialize(x : Foo*) # (1)
  ::raise("Can't pre-initialize abstract struct Foo")
end

def Bar.pre_initialize(x : Foo*) # (2)
  x.clear
  x.as(Foo_CODEGEN*).value.@type_id = Bar.crystal_instance_type_id
  pointerof(x.as(Foo_CODEGEN*).value.@data).as(Bar*).value.@x = 123
  pointerof(x.as(Foo_CODEGEN*).value.@data).as(Bar*).value.@y = 4.5
end

def Bar.pre_initialize(x : Bar*) # (3)
  x.clear
  x.value.@x = 123
  x.value.@y = 4.5
end

def Quux.pre_initialize(x : Foo*) # (4)
  x.clear
  x.as(Foo_CODEGEN*).value.@type_id = Quux.crystal_instance_type_id
  pointerof(x.as(Foo_CODEGEN*).value.@data).as(Quux*).value.@x = 123
  pointerof(x.as(Foo_CODEGEN*).value.@data).as(Quux*).value.@z = "abc"
end

def Quux.pre_initialize(x : Quux*) # (5)
  x.clear
  x.value.@x = 123
  x.value.@z = "abc"
end

buf = uninitialized Foo_CODEGEN
foo = pointerof(buf).as(Foo*)
bar = pointerof(buf).as(Bar*)
quux = pointerof(buf).as(Quux*)

Foo.pre_initialize(foo)  # (1)
Bar.pre_initialize(foo)  # (2)
Quux.pre_initialize(foo) # (4)

Foo.pre_initialize(bar)  # disallowed
Bar.pre_initialize(bar)  # (3)
Quux.pre_initialize(bar) # disallowed

Foo.pre_initialize(quux)  # disallowed
Bar.pre_initialize(quux)  # disallowed
Quux.pre_initialize(quux) # (5)

Bar.as(Foo.class).pre_initialize(foo)  # (2), dynamic dispatch
Bar.as(Foo.class).pre_initialize(bar)  # disallowed
Bar.as(Foo.class).pre_initialize(quux) # disallowed

Quux.as(Foo.class).pre_initialize(foo)  # (4), dynamic dispatch
Quux.as(Foo.class).pre_initialize(bar)  # disallowed
Quux.as(Foo.class).pre_initialize(quux) # disallowed

This most likely requires some kind of compiler support beyond adding a single primitive.

A second problem is what it means to call Bar#initialize on a Foo. Normally, the compiler doesn't do this, and instead downcasts a Bar to Foo after construction:

x = Bar.as(Foo.class).new

# same as:

x = uninitialized Foo
Bar.as(Foo.class).unsafe_construct(pointerof(x))

# same as:

x = uninitialized Foo
case Bar.as(Foo.class)
in Bar.class
  __temp_1 = uninitialized Bar
  Bar.pre_initialize(pointerof(__temp_1))
  __temp_1.initialize
  x = __temp_1 # downcast occurs here
in Quux.class
  __temp_2 = uninitialized Quux
  Quux.pre_initialize(pointerof(__temp_2))
  __temp_2.initialize
  x = __temp_2 # downcast occurs here
in Foo.class
  __temp_3 = uninitialized Foo
  Foo.pre_initialize(pointerof(__temp_3)) # Can't pre-initialize abstract struct Foo
end

What we really want is:

x = uninitialized Foo
case Bar.as(Foo.class)
in Bar.class
  Bar.pre_initialize(pointerof(x))
  x.as(Bar).initialize # ???
  x = x                # no-op
in Quux.class
  Quux.pre_initialize(pointerof(x))
  x.as(Quux).initialize # ???
  x = x                 # no-op
in Foo.class
  # ...
end

Thus .unsafe_construct might have to be a separate primitive for structs too.

straight-shoota commented 4 weeks ago

Wondering about how to reduce boilerplate for initializing ReferenceStorage-backed instances, I cam up with this macro:

macro stack_allocate(assign)
  {% call = assign.value %}
  {% receiver = call.receiver %}

  %storage = uninitialized ReferenceStorage({{ receiver }})
  {{ assign.target }} = {{ receiver }}.unsafe_construct(pointerof(%storage), {% if call.args %} {{ call.args.splat }}, {% end %}{% if call.named_args %}{{ call.named_args.splat }}{% end %})
end

stack_allocate foo = Array(Int32).new(12, value: 123)