Boolean bitfields writes treated as 32/64-bit rather than 8-bit; broken on big-endian

rversteegen commented 3 years ago

When fbc writes to a boolean bitfield it treats the bitfield address as an integer ptr (32 or 64 bit). So it reads 3/7 bytes past the bitfield and then writes them back as they were. Reading from the bitfield doesn't seem to be affected. This is usually invisible and I don't think there's any unittest that can test for this bug on a little-endian platform, but it breaks big-endian platforms because the bitmask used is wrong. This bug was discovered by running on PowerPC (discussion in #290).

Testcase:

type UDT
        a : 1 as boolean
end type
dim y as integer
dim x as UDT
x.a = 0
y = x.a

32-bit -gen gcc x86 result:

    // dim x as UDT
    struct $3UDT X$0;
    __builtin_memset( &X$0, 0, 1 );
    // x.a = 0
    *(uint32*)&X$0 = *(uint32*)&X$0 & 4294967294u;
    // y = x.a
    Y$0 = -(boolean)((uint32)(int32)*(int8*)&X$0 & 1u);

64-bit -gen gcc is the same except it treats the address as a uint64* and uses the bitmask 0xfffffffffffffffe instead.

-gen gas output:

##dim x as UDT
    mov byte ptr [ebp-8], 0
##x.a = 0
    and dword ptr [ebp-8], 4294967294
##y = x.a
    movsx eax, byte ptr [ebp-8]
    and eax, 1
    mov ebx, eax
    movzx eax, bl
    neg eax
    mov dword ptr [ebp-4], eax

lenoil98 commented 2 years ago

Has there been any progress on this issue? Is this related to the issue reading files with little endian BOM?

rversteegen commented 2 years ago

I haven't looked further into this. I wasn't planning to attempt a fix myself.

It's completely unrelated to the BOM and file I/O bugs. I've just dusted off that branch and did a little on it. I could submit it as a draft PR.

lenoil98 commented 2 years ago

That would be great. Looking forward to it.

lenoil98 commented 2 years ago

Is it possible to get access to the code fix for the BOM and file I/O bugs? FBC is having BOM issues when building VisualFBEditor on big endian.

rversteegen commented 2 years ago

OK, I've picked it up again. Sorry for forgetting about it.

lenoil98 commented 1 year ago

Any further progress on this?

rversteegen commented 1 year ago

Sorry... I forgot that you'd previously asked about the big endian file BOM bugs in this issue; that's what you're asking about rather than this bitfield bug? I did indeed resume work on that branch for a while (I just pushed some more to https://github.com/rversteegen/fbc/commits/ppc) but looking at it I see I mostly got really sidetracked with tests and fixes for obscure putback buffer bugs. There are some BOM fixes but apparently not including OPEN ENCODING nor fbc's BOM detection, which are likely the two fixes you most need. But those shouldn't be too hard to fix. I'm a bit busy at the moment though.

lenoil98 commented 1 year ago

Thanks for all your work. I understood that you were probably busy, just wanted to see if you’d done any more on the big endian BOM issues. I’ll checkout your ppc branch. At any rate I’ve installed fbsd 13.2 for little endian and may move everything to that platform.

Again, thanks in advance for your help.

jayrm commented 1 year ago

Maybe not the best code generation, but looks like we should be limiting memory reads and writes to one byte only.

32-bit gas, x86:

    ##dim y as integer
        mov dword ptr [ebp-8], 0
    ##dim x as UDT
        mov byte ptr [ebp-12], 0
    ##x.a = 0
        movzx eax, byte ptr [ebp-12]
        and eax, 4294967294
        mov byte ptr [ebp-12], al
    ##y = x.a
        movsx eax, byte ptr [ebp-12]
        and eax, 1
        mov ebx, eax
        movzx eax, bl
        neg eax
        mov eax, eax
        mov dword ptr [ebp-8], eax

64-bit gas, x86_64:

    ##dim y as integer
       mov QWORD PTR -72[rbp], 0
    ##dim x as UDT
       mov BYTE PTR -76[rbp], 0
    ##x.a = 0
       movzx r11, BYTE PTR -76[rbp]
       and r11, -2
       mov -76[rbp], r11b
    ##y = x.a
       movsx r11, BYTE PTR -76[rbp]
       and r11, 1
       cmp r11b, 0
       setne al
       neg al
       movsx r10, al
       mov -72[rbp], r10

32-bit gcc, x86:

    int32 Y$0;
    __builtin_memset( &Y$0, 0, 4 );
    struct $3UDT X$0;
    __builtin_memset( &X$0, 0, 1 );
    *(uint8*)&X$0 = (uint8)((uint32)(int32)*(uint8*)&X$0 & 4294967294u);
    Y$0 = -(boolean)((uint32)(int32)*(int8*)&X$0 & 1u);

64-bit gcc, x86_64

    int64 Y$0;
    __builtin_memset( &Y$0, 0, 8ll );
    struct $3UDT X$0;
    __builtin_memset( &X$0, 0, 1ll );
    *(uint8*)&X$0 = (uint8)((uint64)(int64)*(uint8*)&X$0 & 18446744073709551614ull);
    Y$0 = (int64)-(boolean)((uint64)(int64)*(int8*)&X$0 & 1ull);

(FYI, the boolean-bitfield branch that was pushed, failed CI due to depletion of build credits on my account, however, CI checks on the PR passed.)

jayrm commented 1 year ago

The first priority was to at least get some changes added so that memory was not overwritten and try to add a few tests for it. Then follow up to explore the code generation.

But at: https://github.com/freebasic/fbc/blob/fef1dfaf592b3b80716e7c15f0fc0c7de7badf30/src/compiler/ast-node-misc.bas#L428

I believe I had intended: return astNewBOP( AST_OP_SHL, hMakeUintMask( bits, dtype ), astNewCONSTi( bitpos ) )

Because, with usage in source linked above, overloaded hMakeUintMask() ultimately resolves to FB_DATATYPE_UINT.

freebasic / fbc

Boolean bitfields writes treated as 32/64-bit rather than 8-bit; broken on big-endian #292