asmjit / cult

CPU Ultimate Latency Test.
Other
105 stars 15 forks source link

Segfault on Older CPUs #4

Closed billyauhk closed 5 years ago

billyauhk commented 7 years ago

When trying to run cult on older CPUs (to be exact, U7300, family 6 model 23 stepping 10), segfault occurs. GDB is pointing the error to the "mov %cr0, %rax" line in the following dump:

push rbx                                ; 53
push rbp                                ; 55
push r12                                ; 4154
push r13                                ; 4155
push r14                                ; 4156
push r15                                ; 4157
sub rsp, 1032                           ; 4881EC08040000
mov rbx, rsi                            ; 488BDE
mov ebp, edi                            ; 8BEF
mov [rsp], rbx                          ; 48891C24
cpuid                                   ; 0FA2
rdtsc                                   ; 0F31
mov [rsp+8], eax                        ; 89442408
mov [rsp+12], edx                       ; 8954240C
test ebp, ebp                           ; 85ED
jz L1                                   ; 0F84........
.align 16
L0:
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
sub ebp, 1                              ; 83ED01
jnz L0                                  ; 0F8546FFFFFF
L1:
mov rax, cr0                            ; 0F20C0
mov cr0, rax                            ; 0F22C0
rdtsc                                   ; 0F31
mov esi, eax                            ; 8BF0
mov edi, edx                            ; 8BFA
mov rbx, [rsp]                          ; 488B1C24
sub esi, [rsp+8]                        ; 2B742408
sbb edi, [rsp+12]                       ; 1B7C240C
mov [rbx], esi                          ; 8933
mov [rbx+4], edi                        ; 897B04
add rsp, 1032                           ; 4881C408040000
pop r15                                 ; 415F
pop r14                                 ; 415E
pop r13                                 ; 415D
pop r12                                 ; 415C
pop rbp                                 ; 5D
pop rbx                                 ; 5B
ret                                     ; C3

(Also attached here the CPUID dump)

CPUID:
  In:00000000 Sub:00000000 -> EAX:0000000D EBX:756E6547 ECX:6C65746E EDX:49656E69
  In:00000001 Sub:00000000 -> EAX:0001067A EBX:00020800 ECX:0C08E3BD EDX:BFEBFBFF
  In:00000002 Sub:00000000 -> EAX:05B0B101 EBX:005657F0 ECX:00000000 EDX:2CB43048
  In:00000003 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:00000004 Sub:00000000 -> EAX:04000121 EBX:01C0003F ECX:0000003F EDX:00000001
  In:00000004 Sub:00000001 -> EAX:04000122 EBX:01C0003F ECX:0000003F EDX:00000001
  In:00000004 Sub:00000002 -> EAX:04004143 EBX:02C0003F ECX:00000FFF EDX:00000001
  In:00000005 Sub:00000000 -> EAX:00000040 EBX:00000040 ECX:00000003 EDX:03122220
  In:00000006 Sub:00000000 -> EAX:00000001 EBX:00000002 ECX:00000003 EDX:00000000
  In:00000007 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:00000008 Sub:00000000 -> EAX:00000400 EBX:00000000 ECX:00000000 EDX:00000000
  In:00000009 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:0000000A Sub:00000000 -> EAX:07280202 EBX:00000000 ECX:00000000 EDX:00000503
  In:0000000C Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:0000000D Sub:00000000 -> EAX:00000003 EBX:00000240 ECX:00000240 EDX:00000000
  In:80000000 Sub:00000000 -> EAX:80000008 EBX:00000000 ECX:00000000 EDX:00000000
  In:80000001 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000001 EDX:20100800
  In:80000002 Sub:00000000 -> EAX:756E6547 EBX:20656E69 ECX:65746E49 EDX:2952286C
  In:80000003 Sub:00000000 -> EAX:55504320 EBX:20202020 ECX:20202020 EDX:55202020
  In:80000004 Sub:00000000 -> EAX:30303337 EBX:20402020 ECX:30332E31 EDX:007A4847
  In:80000005 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:80000006 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:0C006040 EDX:00000000
  In:80000007 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:80000008 Sub:00000000 -> EAX:00003024 EBX:00000000 ECX:00000000 EDX:00000000
kobalicek commented 6 years ago

Hello, thanks for the report. It seems your CPU doesn't have RDTSCP instruction so it uses another code path that unfortunately doesn't work in user-space - my fault, I just implemented it based on an Intel paper without checking it has to run in kernel-space.

I will fix this tonight, however I'm not sure about the max error

billyauhk commented 6 years ago

Thanks for the explanation. I have to look up Intel manual and confirmed that RDTSCP is not available (It says "Support for RDTSCP is indicated by CPUID.80000001H:EDX[27]").

So "the other code path" is using RDTSC+CPUID to achieve the same effect but I still cannot understand why "mov %cr0, %rax" is generated (and why "mov %rax, %cr0" next to it). Any pointers to relevant portion of the code in the repository?

kobalicek commented 5 years ago

Closing as this should already be fixed.