Allow allocator to be provided when generating c++

thirtytwobits commented 4 years ago

specifically for vector types (VariableLengthArrayType) allow an allocator type to be generated as part of the template.

pavel-kirienko commented 4 years ago

Array capacity is known at compile time, would it not be sensible to preallocate memory statically? Do you want to generate a dedicated allocator per array type?

thirtytwobits commented 4 years ago

This is only applicable for dynamic arrays. I suppose we could just allocate the maximum but that seems sub-optimal especially given how trivial it is to support std::vector for those types.

pavel-kirienko commented 4 years ago

Yes, fixed-size arrays are a no-brainer, pre-allocation is the optimal approach. I am talking about variable-size arrays though. If you are using a pool allocator, how are you going to determine the block size? Or are you planning to use cascaded allocators?

thirtytwobits commented 4 years ago

I'm not sure yet.

pavel-kirienko commented 4 years ago

I had a random idea occur to me earlier in the week that needs to be shared. It's based on the concept of small-object allocator proposed by Alexandrescu, his later presentation on Vexing Alligator, and the standard buddy allocator algorithm.

The problem with constant-complexity block allocators is that the size of the allocated memory is fixed. In sufficiently complex scenarios (such as Libuavcan) where the allocation size varies significantly this may lead to suboptimal memory utilization. Ideally, we should be able to allocate arbitrarily sized blocks to reduce the inner fragmentation, while having the worst case outer fragmentation bounded, at constant time.

Imagine that we have a free store which serves allocation requests in the sbrk style (or in the moving garbage collector style) by advancing the pointer to free memory. Being allocated once, the memory is never returned back.

Imagine that we have a set of standard, fixed-size constant-complexity block allocators (like in Libuavcan). Let the smallest one serve blocks of size X. Let the next-larger one serve blocks of size 2X. The maximum multiplier of X is bounded by the size of the available allocation arena. For memory alignment purposes, X should be a sufficiently large power of 2; the lower limit on X is the size of the pointer because free blocks have to be linked-listed.

Here is a sample arrangement following the above model: having the allocation arena of 8 KiB (typical for Libuavcan applications), and the maximum alignment requirement of 128 bits (although modern SIMD instructions usually require more), the blocks would be of size 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192 bytes.

Initially, no memory is allocated, the allocation pointer is at the beginning of the arena. An allocation request of size S is routed to the allocator whose block size B is the smallest while being greater than or equal S. Since there is no memory available yet, the B-sized allocator detects that it cannot serve the request and queries the next-larger block allocator (i.e., 2B-sized) with the objective to get a block from it, split it into two B-sized parts, return one part and add the second part into its free list. The 2B-sized allocator is unable to serve the request, so it is propagated to 4B-sized one and so on until the upper allocator is reached. Then the chain of allocation requests is terminated at the sbrk-style arena allocator which merely advances the pointer by B*2^{(number of allocators between B and arena)}.

When the S-sized memory block is freed, it is added immediately to the free list of the B-sized allocator without any further processing. This differs from the standard buddy allocation logic which coalesces free blocks; compaction is not done because it is incompatible with the constant-complexity allocation/deallocation requirement. One practical outcome is that availability of large memory blocks is dependent on the worst-case utilization of smaller memory blocks.

For non-embedded platforms, the top-level sbrk-style arena allocator could be replaced with the standard malloc().

Considering that we are currently promoting the block allocator in libuavcan for use with STL containers, it might make sense to have it extended with this constant-complexity buddy allocation algorithm.

thirtytwobits commented 4 years ago

Copied your input to a new libuavcan issue (https://github.com/UAVCAN/libuavcan/issues/269). I don't want to provide this as part of Nunavut if I can help it. I'd rather generate templates that accept an allocator in lieu of the default stl allocator and then use a super-great, constant-complexity allocator in libuavcan.

asmfreak commented 2 years ago

@thirtytwobits

I'd rather generate templates that accept an allocator in lieu of the default stl allocator and then use a super-great, constant-complexity allocator in libuavcan.

Should we make generating code with allocator support a language option, like VLA type or cast expression? So that in (de)serialization Jinja templates it should look something like this:

{%- if options.with_allocator_support %}
template<typename Allocator>
{% endif %}
{{ composite_type | definition_begin }} final

// constructors and assignment operators with allocator support

And all of the included composite and array data fields should be instantiated with the same Allocator as used in this type?

thirtytwobits commented 2 years ago

I think this sounds good. Hard to know for sure until you try to implement it.

asmfreak commented 2 years ago

I gave it a bit more thought. All C++ types can be divided in two categories:

Variable-sized types - any type, which is or contains a VLA or any variable-sized type. These types require an allocator.
Const-sized types - any type (primitive or not), which is stack-allocated ONLY and does NOT contain any variable-sized types. This includes unions and const-size arrays. These types do not require an allocator.

Also variable-sized types, if understand correctly, should have as many type parameters as it contains variable-size types.

For example, regulated.basics.PrimitiveArrayVariable.0.1, should be translated into something like this:

template<
    typename a_u64_allocator_t= std::allocator<uint64_t>,
    typename a_u32_allocator_t= std::allocator<uint32_t>,
    typename a_u16_allocator_t= std::allocator<uint16_t>,
    typename a_u8_allocator_t= std::allocator<uint8_t>,
    typename a_u7_allocator_t= std::allocator<uint8_t>,
    typename n_u64_allocator_t= std::allocator<uint64_t>,
    typename n_u32_allocator_t= std::allocator<uint32_t>,
    typename n_u16_allocator_t= std::allocator<uint16_t>,
    typename n_u8_allocator_t= std::allocator<uint8_t>,
    typename n_u7_allocator_t= std::allocator<uint8_t>,
    typename a_i64_allocator_t= std::allocator<int64_t>,
    typename a_i32_allocator_t= std::allocator<int32_t>,
    typename a_i16_allocator_t= std::allocator<int16_t>,
    typename a_i8_allocator_t= std::allocator<int8_t>,
    typename a_i7_allocator_t= std::allocator<int8_t>,
    typename n_i64_allocator_t= std::allocator<int64_t>,
    typename n_i32_allocator_t= std::allocator<int32_t>,
    typename n_i16_allocator_t= std::allocator<int16_t>,
    typename n_i8_allocator_t= std::allocator<int8_t>,
    typename n_i7_allocator_t= std::allocator<int8_t>,
    typename a_f64_allocator_t= std::allocator<float64_t>,
    typename a_f32_allocator_t= std::allocator<float32_t>,
    typename a_f16_allocator_t= std::allocator<float16_t>,
    typename a_bool_allocator_t= std::allocator<bool_t>,
    typename n_bool_allocator_t= std::allocator<bool_t>,
    typename n_f64_allocator_t= std::allocator<float64_t>,
    typename n_f32_allocator_t= std::allocator<float32_t>,
    typename n_f16_allocator_t= std::allocator<float16_t>
>
struct PrimitiveArrayVariable_0_1 final {

    VLA_t<uint64_t, a_u64_allocator_t, CAPACITY> a_u64;
    VLA_t<uint32_t, a_u32_allocator_t, CAPACITY> a_u32;
    VLA_t<uint16_t, a_u16_allocator_t, CAPACITY> a_u16;
    VLA_t<uint8_t, a_u8_allocator_t, CAPACITY>  a_u8;
    VLA_t<uint8_t, a_u7_allocator_t, CAPACITY>  a_u7;

    VLA_t<uint64_t, n_u64_allocator_t, CAPACITY> n_u64;
    VLA_t<uint32_t, n_u32_allocator_t, CAPACITY> n_u32;
    VLA_t<uint16_t, n_u16_allocator_t, CAPACITY> n_u16;
    VLA_t<uint8_t, n_u8_allocator_t, CAPACITY>  n_u8;
    VLA_t<uint8_t, n_u7_allocator_t, CAPACITY>  n_u7;

    uavcan::primitive::Empty_1_0 restore_alignment_1_;

    VLA_t<int64_t, a_i64_allocator_t, CAPACITY> a_i64;
    VLA_t<int32_t, a_i32_allocator_t, CAPACITY> a_i32;
    VLA_t<int16_t, a_i16_allocator_t, CAPACITY> a_i16;
    VLA_t<int8_t, a_i8_allocator_t, CAPACITY>  a_i8;
    VLA_t<int8_t, a_i7_allocator_t, CAPACITY>  a_i7;

    VLA_t<int64_t, n_i64_allocator_t, CAPACITY> n_i64;
    VLA_t<int32_t, n_i32_allocator_t, CAPACITY> n_i32;
    VLA_t<int16_t, n_i16_allocator_t, CAPACITY> n_i16;
    VLA_t<int8_t, n_i8_allocator_t, CAPACITY>  n_i8;
    VLA_t<int8_t, n_i7_allocator_t, CAPACITY>  n_i7;

    uavcan::primitive::Empty_1_0 restore_alignment_2_;

    VLA_t<float64_t, a_f64_allocator_t, CAPACITY> a_f64;
    VLA_t<float32_t, a_f32_allocator_t, CAPACITY> a_f32;
    VLA_t<float16_t, a_f16_allocator_t, CAPACITY> a_f16;

    VLA_t<bool_t, a_bool_allocator_t, CAPACITY> a_bool;
    VLA_t<bool_t, n_bool_allocator_t, CAPACITY> n_bool;

    VLA_t<float64_t, n_f64_allocator_t, CAPACITY> n_f64;
    VLA_t<float32_t, n_f32_allocator_t, CAPACITY> n_f32;
    VLA_t<float16_t, n_f16_allocator_t, CAPACITY> n_f16;
};

I'm not sure, if this is a good idea?

pavel-kirienko commented 2 years ago

Guys, can you please remind me why are we not statically pre-allocating storage for variable-length arrays as we already do for @unions (which are also variable-size entities)? One of the benefits of DSDL, by design, is that the maximum memory footprint is always known statically for any type, which allows one to avoid dynamic allocation trivially.

thirtytwobits commented 2 years ago

Because it uses a lot of RAM for some message definition sets. We should support static allocation but should also support deterministic-dynamic allocation to allow trading CPU time for memory.

thirtytwobits commented 2 years ago

Sorry @ASMfreaK , I wasn't following this thread but it looks like you are waiting on my input. Let me read though this today (It's Saturday morning here) and give it some thought. I'll response soon.

thirtytwobits commented 2 years ago

I'm not sure, if this is a good idea?

My thinking is that we should just generate the allocators as parameters to any collections that require them. Because we are generating concrete types and not defining collections we should try not to generate a templated type except where this is something we've defined in the support library for Nunavut.

So

struct PrimitiveArrayVariable_0_1 final {

    VLA_t<uint64_t, std::allocator<uint64_t>, CAPACITY> a_u64;
    VLA_t<uint32_t, std::allocator<uint32_t>, CAPACITY> a_u32;
    VLA_t<uint16_t, std::allocator<uint16_t>, CAPACITY> a_u16;
    VLA_t<uint8_t, std::allocator<uint8_t>, CAPACITY>  a_u8;
    VLA_t<uint8_t, std::allocator<uint8_t>, CAPACITY>  a_u7;

...

asmfreak commented 2 years ago

So

struct PrimitiveArrayVariable_0_1 final {

    VLA_t<uint64_t, std::allocator<uint64_t>, CAPACITY> a_u64;
    VLA_t<uint32_t, std::allocator<uint32_t>, CAPACITY> a_u32;
    VLA_t<uint16_t, std::allocator<uint16_t>, CAPACITY> a_u16;
    VLA_t<uint8_t, std::allocator<uint8_t>, CAPACITY>  a_u8;
    VLA_t<uint8_t, std::allocator<uint8_t>, CAPACITY>  a_u7;

...

But this code defeats the purpose of this issue. User code cannot change or specify allocator type of fields.

This issue can be implemented via variable_array_type, but only globally - for each and every VLA. I thought, that you (@thirtytwobits) proposed allocator support per-field.

Correct me, if I'm wrong, I can see 4 classes of VLAs, which are:

Very small arrays like uint8[<4], which should, I think, use small buffer optimization.
Bit VLA which are a beast of their own kind.
Medium size arrays, which can maybe benefit from locality (which fit in cache on bigger systems) like uint8[<64], which can also use SBO.
Big arrays, which wont fit in cache, which are a waste to keep inside of the object.

asmfreak commented 2 years ago

SBO logic, I think, should be baked into nunavut-specific VariableLengthArrayType via a specialization, based on, for example, a special kind of an allocator.

I was thinking about creating something like a traits-style class in lines of:

template<typename T, size_t>
struct VLA_trait{
constexpr bool UsesSBO = false;
using allocator_type = std::allocator<T>;
};

And specializing it for some types like this:

template<size_t n, (require n<10) >
struct VLA_trait<uint8_t, 10>{
constexpr bool UsesSBO = true;
using allocator_type = void;
};

But this external trait specializations (maybe even user-defined) must be squeezed between original template definition and generated code. I think this way is fragile as we would need to force the user to add a configuration header (akin to libcanard's CANARD_CONFIG_HEADER, but for each type). Something like UAVCAN_NODE_HEARTBEAT_1_0_CONFIG_HEADER.

OpenCyphal / nunavut

Allow allocator to be provided when generating c++ #98