SanderMertens / flecs

A fast entity component system (ECS) for C & C++
https://www.flecs.dev
Other
6.46k stars 454 forks source link

Incorrect memory alignment of components. #478

Open RobertBiehl opened 3 years ago

RobertBiehl commented 3 years ago

Describe the bug Component structs passed to system do not adhere to the defined memory alignment correctly.

To Reproduce

  1. enable avx

  2. Create component with aligned struct, e.g.

    typedef struct EcsTransform3 {
    __attribute((aligned(32))) mat4 value;
    } EcsTransform3;
  3. Inside system check address of pointer to struct

void TestSystem(ecs_iter_t* it) {
    EcsTransform3 *transform = ecs_term(it, EcsTransform3, 1);
    printf("(EcsTransform3 *) addr = %#010x", transform);
}

Sometimes prints addresses not aligned to 32 bytes as expected from CGLM_ALIGN_MAT. (In my case this was 0x000000011af54ef0, which is only divisible by 16, not by 32.) This causes crashes in calls to AVX intrinsics.

Expected behavior Address of component should be aligned correctly, in this case to 32 bytes.

Additional context Add any other context about the problem here (operating system, hardware, ...).

SanderMertens commented 3 years ago

Hm, flecs should be able to (and has been verified to) handle alignment correctly. It does rely on alignof/ECS_ALIGNOF reporting the correct alignment, could you verify that if you do ECS_ALIGNOF(EcsTransform3) it returns 32?

Additionally you could run the following code to check if the correct alignment has been registered with flecs:

ECS_COMPONENT(world, EcsTransform3);

EcsComponent *ptr = ecs_get(world, ecs_id(EcsTransform3), EcsComponent);
printf("%d\n", ptr->alignment);
RobertBiehl commented 3 years ago

Hi @SanderMertens, thanks for the reply and letting me know about how to inspect alignment.

printf("%d\n", ptr->alignment); does indeed print 32 and ECS_ALIGNOF(EcsTransform3) is also 32.

I guess I'll now need to find out how I could have gotten a pointer to an unaligned address in the system.

Currently the only thing that solves is is to locally copy the mat4 to aligned memory on the stack before continuing. CGLM_ALIGN_MAT mat4 box_to_world; glm_mat4_copy(transform[i].value, box_to_world);

chengts95 commented 3 years ago

Yes, I have the same problem, I was thinking it was my fault, but the alignment of components is always 16byte even I changed the malloc function to use default 32byte alignment.

SanderMertens commented 3 years ago

@RobertBiehl @chengts95 if you're interested in debugging this further, the part of the code that should take care of the alignment is ecs_vector_t.

A vector is a contiguous block of memory with a header in front of it (containing the number of elements + allocated size). At an offset from the header the actual array begins, and it's important that this offset is aligned to the alignment of the component type.

The alignment is computed by macro's and passed into the vector functions (to allow for compile-time optimization where the vector type is known). If anything's wrong with how the component alignment is used, I would expect that to be in this code.

I'd like to look into this myself as well but it may take a bit of time before I can get to it.

chengts95 commented 3 years ago

If I am right, the alignment attribute can only determine the padding of the struct to fulfill the address alignment requirement, like expanding a 14byte struct to 16byte so its array is aligned.
However, the heap memory address is determined by the malloc function, the malloc function is default to 16byte and I didn't see the vector is created by aligned_alloc anywhere, so it will be 16byte aligned. So @RobertBiehl's struct cannot ensure a 32byte aligned address.

My purpose was to make components with double float-point numbers aligned to 32byte addresses so aligned AVX can be used, which means the first double number's address should be aligned to 32byte. The key problem is to have the starting address aligned to 32byte even the struct itself doesn't have an alignment attribute.

Example:

    int *p1 = (int*)malloc(16*sizeof(int) );
 //16*4=64 bytes can be aligned to 32byte address, however the alignment is 16byte
    free(p1);

    int *p2 =  (int*)aligned_alloc(32, 16*sizeof (int)); //same amount of data but aligned to 32byte address
    free(p2);
chengts95 commented 3 years ago

@RobertBiehl @chengts95 if you're interested in debugging this further, the part of the code that should take care of the alignment is ecs_vector_t.

A vector is a contiguous block of memory with a header in front of it (containing the number of elements + allocated size). At an offset from the header the actual array begins, and it's important that this offset is aligned to the alignment of the component type.

The alignment is computed by macro's and passed into the vector functions (to allow for compile-time optimization where the vector type is known). If anything's wrong with how the component alignment is used, I would expect that to be in this code.

I'd like to look into this myself as well but it may take a bit of time before I can get to it.

Where is the code to compute the offset? The offset should be different under different malloc situations.

SanderMertens commented 3 years ago

@chengts95 this is the code that computes the offset: https://github.com/SanderMertens/flecs/blob/master/include/flecs/private/vector.h#L50

Now that I'm looking at the code I think I also see the issue 💥 . The offset calculation simply does max(header_size, alignment) which works as long as the alignment is not larger than the alignment of the malloc, but in this case (32 bits alignment vs. 16 bits malloc alignment) this won't work.

If malloc returns address 16 then the offset calculation would add 32 which ends up being 48 which is unaligned. The code must be changed to take the max of header_size and the starting address aligned to the component alignment.

chengts95 commented 3 years ago

@chengts95 this is the code that computes the offset: https://github.com/SanderMertens/flecs/blob/master/include/flecs/private/vector.h#L50

Now that I'm looking at the code I think I also see the issue 💥 . The offset calculation simply does max(header_size, alignment) which works as long as the alignment is not larger than the alignment of the malloc, but in this case (32 bits alignment vs. 16 bits malloc alignment) this won't work.

If malloc returns address 16 then the offset calculation would add 32 which ends up being 48 which is unaligned. The code must be changed to take the max of header_size and the starting address aligned to the component alignment.

That's it! I also want to have a 32byte aligned double array, if we determine the alignment only by the struct size, you cannot get an AVX array with a plain double number component, if we can manually choose the alignment just like malloc and aligned_alloc, it will be better! Normally people can manually peel the loop to fit the AVX requirement, but it is more convenient if the starting address is aligned.

SanderMertens commented 3 years ago

Sorry for the late reply! I've been looking at ways to address this, but there isn't a straightforward way to solve this with the current vector without changing lots of code.

I'll soon start working again on more storage related stuff, and one thing I've been wanting to do is change the component storage from a regular vector to a paged vector. I'll make sure that the new storage is able to deal with >16bits alignment.

SanderMertens commented 2 years ago

I just modified the component storage to no longer use ecs_vector_t. Component arrays now use a regular malloc. This doesn't give you correct memory aligned addresses yet, but you can now register a malloc with the OS API that only returns 128bit aligned (or more) addresses.

Not a full solution, but at least there's a workaround :)