bitwiseworks / qtwebengine-chromium-os2

Port of Chromium and related tools to OS/2
9 stars 2 forks source link

Fix memory alignment errors #20

Closed dmik closed 4 years ago

dmik commented 4 years ago

Chromium complains about memory alignment in some cases and crashes the application in both debug and release builds (see https://github.com/bitwiseworks/qtwebengine-os2/issues/6#issuecomment-680286782). Here are the relative console messages:

[57201:12:0826/013109.391000:FATAL:page_allocator.cc(204)] Check failed: !(reinterpret_cast<uintptr_t>(address) & kPageAllocationGranularityOffsetMask). 
...
[57159:12:0826/012542.065000:FATAL:options_validation.h(35)] Check failed: options && IsAligned<MOJO_ALIGNOF(Options)>(options). 

The first assertion is triggered because base::FreePages is supplied a memory address which is not on a 64K (memory object) boundary, e.g. something like 0x2131d000. I wonder how this is possible because the page allocator API in base uses DosAllocMem and that should always returns objects aligned on a 64K boundary. Deeper debugging is needed.

As for the second one, it's a bit more tricky. The code requires some C++ struct to be allocated (on a C heap I suppose) with its alignment in mind (as returned by GCC's __alignof__). I have no idea why it's not the case. Also, more debugging is needed.

dmik commented 4 years ago

Tests of the alignas problem lead us to SharedBufferDispatcher::ValidateCreateOptions in chromium/mojo/core/shared_buffer_dispatcher.cc. It gets an address of a alignas(8) MojoCreateSharedBufferOptions variable allocated on the stack in Core::CreateSharedBuffer (chromium/mojo/core/core.cc). Sometimes, the address of this variable doesn't end with 0x0 or 0x8 as it should be because of alignas(8) but is something like 0x43ee034 or 0x43ede0c (i.e. unaligned). I created a separate GCC ticket for that issue as it might be a GCC problem.

Further tests demonstrated that using alignas(16) makes it work (i.e. GCC doesn't seem to break alignment in this case) but unfortunately this changes the size of some structs. GCC seems to round the structure size up to the closest aligned value from the sizeof POV and Mojo (which packs data when transferring it between processes) is very picky about the exact struct sizes by using static assertions (which prevent compilation).

Currently I see no solution to that other than disable this alignment check completely in Mojo (the only alternative is to fix GCC but this might be a long road). I don't see any direct consequences of disabling this align check ATM (like using the variable address in some math or so) but I may be unaware of something, I don't know this code by heart. Anyway, no other choice for now, as already mentioned.

dmik commented 4 years ago

Regarding the first problem, I added some extra logging, caught it again and now I have a proof that DosAllocMem(OBJ_ANY | PAG_READ) returns an address which is NOT on a 64K boundary (e.g. 0x22409000). I've never seen that before. And to my understanding this contradicts with what docs say:

Note: DosAllocMem and DosAllocSharedMem both allocate a block of memory of the size requested rounded up to the nearest page. On OS/2 Warp, the system allocates a 64K object without attributes on every allocation.

The docs don't explicitly mention that 64K objects start on a 64K boundary but that's kinda assumed... Sigh. I'm really confused here. If I won't find any detail on that, I will just set page alloc granularity to 4K on OS/2 (i.e. equal to the page size).

StevenLevine commented 4 years ago

I've read the 64K object size comment too. However, based on your findings I have to suspect the docs are not entirely correct. It appears that the 64K object size may apply only to tiled memory.

Looking at the kernel code, I can see where the input size is rounded up to 4KB. There is no explicit rounding to 64KB, so it is most likely triggered by one of the allocation flags.

One thing I have learned over the years is that the IBM docs are pretty good, but at the same time they have not been updated on almost 30 years so there is bound to be errata. When behavior does not seem to match the docs, the only option is a testcase or two.

dmik commented 4 years ago

Steven, thanks for the feedback. Note that there are no special flags besides OBJ_ANY (the code does never use OBJ_TILE). So it's always OBJ_ANY | PAG_READ [ | PAG_WRITE etc ] and in a series of previous allocations (around 50) these calls return addresses ending with 0000. Anyway, I will just change the granularity to 4K, given what you say.

And yes, I generally used to not trusting docs but doing some test cases :)

StevenLevine commented 4 years ago

The behavior has nothing specific to do with the OBJ_ flags you are using. For the testcase, I am using PAG_WRITE | PAG_COMMIT | OBJ_ANY.

The testcase indicates that the alignment is 4KB. Since the runtime heaps, typically allocate 64KB chunks. the first DosAllocMem is aligned on a 64KB boundary, at least in the testcase. As long as subsequent allocations are 64KB multiples the blocks are 64KB aligned. After an allocation for less than 62KB+1 occurs, the alignment will be 4KB and will continue this way even when 64KB blocks are allocated.

At some point, I may test how OBJ_TILE affects the alignment just so I more fully understand the behavior.

dmik commented 4 years ago

I got a thought that the behavior differs for high memory and for low memory (and that for low memory it matches what the docs say). So I had to make my own small test case to check that. And yes, this assumption is right.

When allocating shared memory (DosAllocSharedMem), the picture is just the same with the exception that sequential allocations go down in terms of address values (which is expected given that shared memory starts on top and grows down).

Another interesting observation is that if you call DosQueryMem on a free (unallocated) low memory address, it returns error 487 (ERROR_INVALID_ADDRESS). However, for free high memory it succeeds and returns 0x4000 in flags (PAG_FREE). Might save someone some hair.

Ok, case closed then. We have to use 4 KB as allocation granularity and expect returned addresses have a 4 KB alignment. Note that with this applied to the source code, I have Chromium run much further, it could even start playing some YouTube video (but crashed due to some missing audio interface assertions, will create a ticket for that later).