Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Stack is misaligned for an SSE instruction #47984

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR49015
Status NEW
Importance P normal
Reported by Fatih Bakir (mfatihbakir@gmail.com)
Reported on 2021-02-03 00:56:32 -0800
Last modified on 2021-02-03 09:45:08 -0800
Version 11.0
Hardware PC Linux
CC craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Hello,

LLVM seems to emit a movaps %xmm0, (%rsp) promptly after it pushes 40 bytes to a 16 bytes aligned stack, which causes a general protection fault. movaps requires the memory operands to be 16 bytes aligned.

I tried to isolate this as much as possible and put the source code, the LLVM IR and the emitted code to this gist and explained a bit more in a comment: https://gist.github.com/FatihBAKIR/8bc6529c5bd801af1be3edcbdcbdabb3

At the time the instruction is executed, RSP is at 0x20fb18, which is misaligned in the entry to this function.

The code is compiled with the following flags: -target x86_64-none-elf -mno-red-zone -fno-stack-protector -fomit-frame-pointer -mno-avx -ffunction-sections -fdata-sections -ffreestanding -flto -fno-rtti -fno-exceptions -fno-unwind-tables -fno-threadsafe-statics -Os -nostdlib -nostdinc -std=gnu++2a

Apologies if I'm missing something obvious.

How to reproduce:

It's difficult to deliver the exact environment to try the code as is, but I tried to simplify it as much as possible to this:

#include <cstdint>
#include <vector>

class network_device {
    struct buffer;
    std::vector<buffer> m_buffers;

    void queue_rx_buf(buffer&& buf);

    void isr(void* f, int num);
};
struct virtio_net_hdr {
    uint8_t flags;
    uint8_t gso_type;
    uint16_t hdr_len;
    uint16_t gso_size;
    uint16_t csum_start;
    uint16_t csum_offset;
    uint16_t num_buffers;

};

struct network_device::buffer {
    virtio_net_hdr* header;
    void* data;
};

void network_device::isr(void* f, int num) {
    auto isr_status = 1;
    if (isr_status & 1) {
        auto buf = std::move(m_buffers.front());
        *buf.header = {};
        m_buffers.erase(m_buffers.begin());
        queue_rx_buf(std::move(buf));
    }
}

(Godbolt: https://godbolt.org/z/GY8o55)

Compiling this with -mno-red-zone -fno-stack-protector -fomit-frame-pointer -mno-avx -ffunction-sections -fdata-sections -fno-rtti -fno-exceptions -fno-unwind-tables -fno-threadsafe-statics -Os -std=gnu++2a emits code that starts by pushing 24 bytes to the stack, again breaking the 16 bytes alignment of %RSP:

tos::virtio::network_device::isr(void*, int): # @tos::virtio::network_device::isr(void*, int)
        push    rbx
        sub     rsp, 16
        mov     rbx, rdi
        mov     rax, qword ptr [rdi]
        movups  xmm0, xmmword ptr [rax]
        movaps  xmmword ptr [rsp], xmm0
Quuxplusone commented 3 years ago
Hi Fatih, what's the OS you are using? AFAIK, some OS will make the RSP be
aligned to 16 bytes. So when we enter a function, the RSP is always ****8,
since the return address was pushed. So the code here makes the RSP aligned
again.
But if the OS doesn't align the RSP at begining, you need to use option "-
mstackrealign" to tell compiler to do so.
Quuxplusone commented 3 years ago

This is a custom OS and I wrote everything that leads to entering this function, ensuring 16 bytes alignment at every step. I can confirm the stack is 16 bytes aligned at the entry to this function, and the push + sub at the entry break the alignment.

In fact, as a hacky work around, I just added asm volatile("push %rax"); and asm volatile("pop %rax"); to the beginning and end of the function and it seems to have fixed it temporarily.

Also, -mstackrealign also works (it emits a and $0xfffffffffffffff0,%rsp) with the following entry:

   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   41 57                   push   %r15
   6:   41 56                   push   %r14
   8:   53                      push   %rbx
   9:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
   d:   48 83 ec 30             sub    $0x30,%rsp

While this works, it's not ideal since I already ensure the stack is well-aligned. It also seems to undo -fomit-frame-pointer.

Quuxplusone commented 3 years ago
The x86-64 psABI has this to say

"The end of the input argument area shall be aligned on a 16 (32 or 64, if
__m256 or __m512 is passed on stack) byte boundary. In other words, the value
(RSP + 8) is always a multiple of 16
(32 or 64) when
control is transferred to the function entry point. The stack pointer, RSP,
always points to the end of the latest allocated
stack frame."

So LLVM assumes RSP+8 is 16 byte aligned.