QuarkContainer / Quark

A secure container runtime with CRI/OCI interface
Apache License 2.0
313 stars 47 forks source link

Issue with gvisor tests when running on ARM #1330

Closed mhomidi closed 1 week ago

mhomidi commented 1 month ago

Hi,

I tried to build the binary files of test of gvisor to run them on the Quark when our system is arm. The instruction is that you just need to follow this instruction to build binary files of each test you want. For instance I tried to run for write syscall. So I just run

bazel test //test/syscalls:write_test_native

Then, by bazel info bazel-bin you can get the directory of binary files which is something like /home/[USERNAME]/.cache/bazel/_bazel_[USERNAME]/c8d627c8e0d3a3fd49664c9d76222cd6/execroot/__main__/bazel-out/aarch64-fastbuild/bin Then, you just need to run ./test/syscalls/write_test_native

Now if you do these steps on x86 and run on Quark, everything is okay and the output is:

+ [[ -n /htmp ]]
+ mkdir -p /htmp
+ chmod a+rwx /htmp
+ exec test/runner/runner_/runner --platform=native --platform-support=32BIT:TRUE,ALIGNMENT_CHECK:TRUE,INT3:TRUE,MULTIPROCESS:TRUE,VSYSCALL:TRUE --network=none --use-tmpfs=False --fusefs=False --file-access=exclusive --overlay=False --add-host-uds=False --add-host-connector=False --add-host-fifo=False --strace=True --debug=True --container=False --one-sandbox=True --iouring=False --directfs=False --leak-check=False --save=False --save-resume=False test/syscalls/linux/write_test
I0718 21:09:22.197586       1 main.go:170] Executing: [/gtest/test/syscalls/linux/write_test --gtest_filter=WriteTest.WriteNoExceedsRLimit]
Note: Google Test filter = WriteTest.WriteNoExceedsRLimit
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from WriteTest
[ RUN      ] WriteTest.WriteNoExceedsRLimit
[       OK ] WriteTest.WriteNoExceedsRLimit (1 ms)
[----------] 1 test from WriteTest (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (1 ms total)
[  PASSED  ] 1 test.
Failed to match any benchmarks against regex: .
PASS

But when you do it on an ARM processor, the output will be some thing like this:

+ [[ -n /htmp ]]
+ mkdir -p /htmp
+ chmod a+rwx /htmp
+ exec test/runner/runner_/runner --platform=native --platform-support=32BIT:TRUE,ALIGNMENT_CHECK:TRUE,INT3:TRUE,MULTIPROCESS:TRUE,VSYSCALL:TRUE --network=none --use-tmpfs=False --fusefs=False --file-access=exclusive --overlay=False --add-host-uds=False --add-host-connector=False --add-host-fifo=False --strace=True --debug=True --container=False --one-sandbox=True --iouring=False --directfs=False --leak-check=False --save=False --save-resume=False test/syscalls/linux/write_test
SIGILL: illegal instruction
PC=0x20000000 m=0 sigcode=0

Then it gets stuck and you need to kill quark process. (At least mine)

After some checking it seems that the issue happens when the script wants to run the runner in test/runner/runner_/runner (from this file in gvisor test). I tried to debug it but I understood it does not even enter the main function.

Does anyone has any idea what the problem is?

shrik3 commented 1 month ago

what does the binary look like? could you run that natively on aarch64 machine (not in quark)? and/or could you isolate and run the test binary instead of invoking it from a script?

mhomidi commented 1 month ago

what does the binary look like? could you run that natively on aarch64 machine (not in quark)?

Yes, I can run it natively and on gvisor too, but when I use Quark, this happens.

and/or could you isolate and run the test binary instead of invoking it from a script?

I do not think so. Maybe I need to change the test code to run them.

chl337 commented 1 month ago

@mhomidi could you please retry by excluding first this test: https://github.com/google/gvisor/blob/84f1146368c821fb760d7c497663a332e8dee9f8/test/syscalls/BUILD#L11.

mhomidi commented 1 month ago

I have a same issue. The problem is from runner not tests.

chl337 commented 1 month ago

I have a same issue. The problem is from runner not tests.

Sorry, I didn't read your post carefully enough the first time. What the readelf -h of the binary looks like? Have you tried to produce a dump of the binary?

mhomidi commented 1 month ago

What the readelf -h of the binary looks like? Have you tried to produce a dump of the binary? Hi @chl337 , Sorry for the late. The readelf -h of runner is:

ELF Header:
Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class:                             ELF64
Data:                              2's complement, little endian
Version:                           1 (current)
OS/ABI:                            UNIX - System V
ABI Version:                       0
Type:                              EXEC (Executable file)
Machine:                           AArch64
Version:                           0x1
Entry point address:               0x88620
Start of program headers:          64 (bytes into file)
Start of section headers:          400 (bytes into file)
Flags:                             0x0
Size of this header:               64 (bytes)
Size of program headers:           56 (bytes)
Number of program headers:         6
Size of section headers:           64 (bytes)
Number of section headers:         14
Section header string table index: 13

I tried to objdump the runner binary and see the what happen in that pc but it does not make any sense for me. Unfortunatly the size of file is too high and I can not upload it.


I also figure out one of the problem that our application get stuck in runner and does not terminate is that the size of some data structure in signal handling is wrong. For UContext in qlib/kernel/SignalDef.rs:376, I did this changes:

pub struct UContext {
    pub Flags: u64,
    pub Link: u64,
    pub Stack: SignalStack,
    pub Sigset: u64,
    __pad: [u8; (1024 / 8) - core::mem::size_of::<u64>()],
    __pad2: [u8; 8], // Added this 8 byte
    pub MContext: SigContext,
}

and in SigContext in qlib/kernel/SignalDef.rs:458, I did this change:

pub struct SigContext {
    pub fault_address: u64,
    /* AArch64 registers */
    pub regs: [u64; 31],
    pub sp: u64,
    pub pc: u64,
    pub pstate: u64,
    pub oldmask: u64, /* should we do it here? */
    /* 4K reserved for FP/SIMD state and future expansion */
    pub __reserved: [u8; 528], // Change from 4096 to 528. 
}
chl337 commented 1 month ago

I also figure out one of the problem that our application get stuck in runner and does not terminate is that the size of some data structure in signal handling is wrong. For UContext in qlib/kernel/SignalDef.rs:376, I did this changes:

pub struct UContext {
    pub Flags: u64,
    pub Link: u64,
    pub Stack: SignalStack,
    pub Sigset: u64,
    __pad: [u8; (1024 / 8) - core::mem::size_of::<u64>()],
    __pad2: [u8; 8], // Added this 8 byte
    pub MContext: SigContext,
}

and in SigContext in qlib/kernel/SignalDef.rs:458, I did this change:

pub struct SigContext {
  pub fault_address: u64,
  /* AArch64 registers */
  pub regs: [u64; 31],
  pub sp: u64,
  pub pc: u64,
  pub pstate: u64,
    pub oldmask: u64, /* should we do it here? */
  /* 4K reserved for FP/SIMD state and future expansion */
  pub __reserved: [u8; 528], // Change from 4096 to 528. 
}
  • Can you explain how you came up with the changes?
  • ... or point your source of reference?
  • Have you also checked if these changes don't cause issues with other executables?

Plus sidenote related to #1326, Go uses X28 to preserve g, so I experimented with that by preserving the X28 before the frame is cleared:

    *regs = PtRegs::default();

and setting it up again. Redis did work. So should we take that as indication to preserve the non-volatile registers even in the signal frame?

P.S.-1: Is the UContext definition based on gvisor? P.S.-2: Our definition of SigContext is compatible with the one from Linux, so I would refrain from changing it (unless you know what you are doing).

mhomidi commented 1 month ago
  • Can you explain how you came up with the changes?
  • ... or point your source of reference?

Mostly I just compare gvisor and our project and see this difference. About UContext if I remove the padding, the runner of gvisor test crash and we need to kill quark.

  • Have you also checked if these changes don't cause issues with other executables? I checked with other (performance tests)[] and it does not have any effect on them.

So should we take that as indication to preserve the non-volatile registers even in the signal frame?

In that case it may be okay to just remain that register.

P.S.-1: Is the UContext definition based on gvisor? Yes

P.S.-2: Our definition of SigContext is compatible with the one from Linux, so I would refrain from changing it (unless you know what you are doing).

I checked this and you are right.

chl337 commented 1 month ago

On Thu Jul 25, 2024 at 7:24 PM CEST, Hadi Omidi wrote:

  • Can you explain how you came up with the changes?
  • ... or point your source of reference?

Mostly I just compare gvisor and our project and see this difference. About UContext if I remove the padding, the runner of gvisor test crash and we need to kill quark. Indded it was an error from out side.

  • Have you also checked if these changes don't cause issues with other executables? I checked with other (performance tests)[] and it does not have any effect on them.

So should we take that as indication to preserve the non-volatile registers even in the signal frame?

In that case it may be okay to just remain that register. Actually it is a difference in Linux ABI: x86_64 vs Aarch64. On Aarch64 the GPR are preserved.

P.S.-1: Is the UContext definition based on gvisor? Yes

P.S.-2: Our definition of SigContext is compatible with the one from Linux, so I would refrain from changing it (unless you know what you are doing).

I checked this and you are right.

shrik3 commented 3 weeks ago

on linux: arch/arm64/include/uapi/asm/ucontext.h

struct ucontext {
    unsigned long     uc_flags;
    struct ucontext  *uc_link;
    stack_t       uc_stack;
    sigset_t      uc_sigmask;
    /* glibc uses a 1024-bit sigset_t */
    __u8          __unused[1024 / 8 - sizeof(sigset_t)];
    /* last for future expansion */
    struct sigcontext uc_mcontext;
};

I don't see why we should add another 8 bytes padding between sigset and MContext.

the sigcontext on the other hand, expands to

struct sigcontext {
    __u64 fault_address;
    /* AArch64 registers */
    __u64 regs[31];
    __u64 sp;
    __u64 pc;
    __u64 pstate;
    /* 4K reserved for FP/SIMD state and future expansion */
    __u8 __reserved[4096] __attribute__((__aligned__(16)));
};

This seems to match with ours (except for the oldmask, which doesn't matter because it logically merges into the padding.

where exactly is the 128 bytes padding in SigContext coming from?