attestantio / dirk

Apache License 2.0
75 stars 22 forks source link

crash with 1.0.2. Fine with 1.0.1 #9

Closed sloppycoffee closed 3 years ago

sloppycoffee commented 3 years ago

Here is the tail of the log

{"level":"info","version":"1.0.2","time":"2021-03-30T17:44:04Z","message":"Starting dirk"} SIGILL: illegal instruction PC=0x107f8a4 m=0 sigcode=2

goroutine 0 [idle]: runtime: unknown pc 0x107f8a4 stack: frame={sp:0x7fff3c084c80, fp:0x0} stack=[0x7fff3b886528,0x7fff3c085560) 00007fff3c084b80: e7a278e38f6be830 dc54b77391b667d2 00007fff3c084b90: 0000000000000000 0000000000000000 00007fff3c084ba0: 0000000000000000 0000000000000000 00007fff3c084bb0: 0000000000000000 0000000000000000 00007fff3c084bc0: 0000000000000000 0000000000000004 00007fff3c084bd0: 0000000000004000 0000000000000000 00007fff3c084be0: 0000000000000007 0000000000000000 00007fff3c084bf0: 0000000000000072 00000000023cd008 00007fff3c084c00: 0000000000003fff 0000000000004030 00007fff3c084c10: ffffffffffffff88 0000000000000100 00007fff3c084c20: 0000013000000401 0000000000000020 00007fff3c084c30: 0000000000000004 000000720000007b 00007fff3c084c40: 000000770000007c 0000000000000000 00007fff3c084c50: 0000000000000005 00007fbc59493c40 00007fff3c084c60: 0000000000004000 00000000000003ff 00007fff3c084c70: ffffffffffffff88 0000000000000200 00007fff3c084c80: <000000000107f882 00000000023e1570 00007fff3c084c90: 0000000001f50208 00007fff3c084ce8 00007fff3c084ca0: 0000000000000004 000000000107f501 00007fff3c084cb0: 00000000023dd590 000000000105ad0e 00007fff3c084cc0: 00000000000001ff 00000000023dd590 00007fff3c084cd0: 0000000000000800 0000000000000000 00007fff3c084ce0: 0000000000000002 0000000000000000 00007fff3c084cf0: 0000000000000000 0000000000000000 00007fff3c084d00: 0000000000000000 0000000000000000 00007fff3c084d10: 0000000002f112ed 00007fff3c084e10 00007fff3c084d20: 00007fff3c084d68 0000000000479816 <runtime.pcdatavalue+134> 00007fff3c084d30: 000000000176e2f0 0000000001ed58c0 00007fff3c084d40: 000000000009320b 0000000000000004 00007fff3c084d50: 00007fff3c085200 0000000001f50210 00007fff3c084d60: 0000000000000008 54b165b8181ac726 00007fff3c084d70: 1ad4fdb87d39fbb4 55c5243a9a0aca61 runtime: unknown pc 0x107f8a4 stack: frame={sp:0x7fff3c084c80, fp:0x0} stack=[0x7fff3b886528,0x7fff3c085560) 00007fff3c084b80: e7a278e38f6be830 dc54b77391b667d2 00007fff3c084b90: 0000000000000000 0000000000000000 00007fff3c084ba0: 0000000000000000 0000000000000000 00007fff3c084bb0: 0000000000000000 0000000000000000 00007fff3c084bc0: 0000000000000000 0000000000000004 00007fff3c084bd0: 0000000000004000 0000000000000000 00007fff3c084be0: 0000000000000007 0000000000000000 00007fff3c084bf0: 0000000000000072 00000000023cd008 00007fff3c084c00: 0000000000003fff 0000000000004030 00007fff3c084c10: ffffffffffffff88 0000000000000100 00007fff3c084c20: 0000013000000401 0000000000000020 00007fff3c084c30: 0000000000000004 000000720000007b 00007fff3c084c40: 000000770000007c 0000000000000000 00007fff3c084c50: 0000000000000005 00007fbc59493c40 00007fff3c084c60: 0000000000004000 00000000000003ff 00007fff3c084c70: ffffffffffffff88 0000000000000200 00007fff3c084c80: <000000000107f882 00000000023e1570 00007fff3c084c90: 0000000001f50208 00007fff3c084ce8 00007fff3c084ca0: 0000000000000004 000000000107f501 00007fff3c084cb0: 00000000023dd590 000000000105ad0e 00007fff3c084cc0: 00000000000001ff 00000000023dd590 00007fff3c084cd0: 0000000000000800 0000000000000000 00007fff3c084ce0: 0000000000000002 0000000000000000 00007fff3c084cf0: 0000000000000000 0000000000000000 00007fff3c084d00: 0000000000000000 0000000000000000 00007fff3c084d10: 0000000002f112ed 00007fff3c084e10 00007fff3c084d20: 00007fff3c084d68 0000000000479816 <runtime.pcdatavalue+134> 00007fff3c084d30: 000000000176e2f0 0000000001ed58c0 00007fff3c084d40: 000000000009320b 0000000000000004 00007fff3c084d50: 00007fff3c085200 0000000001f50210 00007fff3c084d60: 0000000000000008 54b165b8181ac726 00007fff3c084d70: 1ad4fdb87d39fbb4 55c5243a9a0aca61

goroutine 1 [syscall]: runtime.cgocall(0xf6d740, 0xc0008bfda8, 0xc0008bfdc8) /usr/local/go/src/runtime/cgocall.go:133 +0x5b fp=0xc0008bfd78 sp=0xc0008bfd40 pc=0x4283cb github.com/herumi/bls-eth-go-binary/bls._Cfunc_blsInit(0xf600000005, 0x0) _cgo_gotypes.go:417 +0x4d fp=0xc0008bfda8 sp=0xc0008bfd78 pc=0x8ee42d github.com/herumi/bls-eth-go-binary/bls.Init(0x5, 0xa, 0x40) /go/pkg/mod/github.com/herumi/bls-eth-go-binary@v0.0.0-20210130185500-57372fb27371/bls/bls.go:63 +0x40 fp=0xc0008bfe00 sp=0xc0008bfda8 pc=0x8f0750 github.com/wealdtech/go-eth2-types/v2.InitBLS(0x200, 0x40) /go/pkg/mod/github.com/wealdtech/go-eth2-types/v2@v2.5.2/bls.go:22 +0x2e fp=0xc0008bfe28 sp=0xc0008bfe00 pc=0x8fd84e main.main() /app/main.go:124 +0x246 fp=0xc0008bff88 sp=0xc0008bfe28 pc=0xf62f56 runtime.main() /usr/local/go/src/runtime/proc.go:203 +0x1fa fp=0xc0008bffe0 sp=0xc0008bff88 pc=0x45c5ea runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0008bffe8 sp=0xc0008bffe0 pc=0x48ce51

goroutine 102 [select]: go.opencensus.io/stats/view.(*worker).start(0xc00017a180) /go/pkg/mod/go.opencensus.io@v0.22.5/stats/view/worker.go:276 +0x100 created by go.opencensus.io/stats/view.init.0 /go/pkg/mod/go.opencensus.io@v0.22.5/stats/view/worker.go:34 +0x68

rax 0x7fc rbx 0x23dd590 rcx 0x1f50210 rdx 0xc999e990f3f29c6d rdi 0x107f4b0 rsi 0x7fff3c084ce0 rbp 0x107f501 rsp 0x7fff3c084c80 r8 0x3 r9 0x1f50208 r10 0x23cb010 r11 0xc999e990f3f29c6d r12 0x4 r13 0x7fff3c084ce8 r14 0x1f50208 r15 0x23e1570 rip 0x107f8a4 rflags 0x10206 cs 0x33 fs 0x0 gs 0x0

Operating System: Linux Mint 20.1 Kernel: Linux 5.4.0-67-generic Architecture: x86-64

mcdee commented 3 years ago

Thanks for reporting this. It sounds as though it may be the same issue as https://github.com/prysmaticlabs/prysm/issues/8410

What is the output of cat /proc/cpuinfo?

sloppycoffee commented 3 years ago

processor : 63 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6378 stepping : 0 microcode : 0x6000852 cpu MHz : 3235.596 cache size : 2048 KB physical id : 3 siblings : 16 core id : 7 cpu cores : 8 apicid : 143 initial apicid : 111 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips : 5763.17 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

mcdee commented 3 years ago

Given the age of the CPU I suspect that the issue is due to some changes made in the BLS library we use, however I'm unable to reproduce it locally. Can I send you a patched binary to test, to see if this fixes the issue?

sloppycoffee commented 3 years ago

I rebuilt the machine and went back to prysm only. No Dirk. Sorry, won't be able to test the fix.