Closed kernelogic closed 4 years ago
Did you overclock your RAM / have XMP enabled? If yes, can you try disabling it and check if that helps?
I got error ,when the calibration network working on the machine of Miner committing 2020-07-27T21:24:23.395 INFO filcrypto::proofs::api > verify_seal: finish 2020-07-27T21:24:23.395+0800 ERROR sectors storage-fsm@v0.0.0-20200720190000-2cfe2fe3c334/fsm.go:26 unhandled sector error (4): checkCommit sanity check error: github.com/filecoin-project/storage-fsm.(*Sealing).handleCommitFailed /root/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200720190000-2cfe2fe3c334/states_failed.go:184
Could you also share the output of these commands?
sudo lshw -C memory
sudo lshw -C cpu
sudo dmidecode -t 2
It shows the modelnumbers / hardware information about your CPU, Motherboard and RAM.
sudo lshw -C cpu
*-cpu
description: CPU
product: AMD Ryzen 9 3950X 16-Core Processor
vendor: Advanced Micro Devices [AMD]
physical id: 34
bus info: cpu@0
version: AMD Ryzen 9 3950X 16-Core Processor
serial: Unknown
slot: AM4
size: 2028MHz
capacity: 3500MHz
width: 64 bits
clock: 100MHz
root@ubuntu-System-Product-Name:~# sudo dmidecode -t 2
Getting SMBIOS data from sysfs. SMBIOS 3.2.0 present.
Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: ASUSTeK COMPUTER INC. Product Name: PRIME X570-P Version: Rev X.0x Serial Number: 200468461001035 Asset Tag: Default string Features: Board is a hosting board Board is replaceable Location In Chassis: Default string Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0
-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 2407
date: 07/01/2020
size: 64KiB
capacity: 16MiB
capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
-memory
description: System Memory
physical id: 2e
slot: System board or motherboard
size: 128GiB
Will disable XMP and try again.
My machine used ASUS mainboard and the D.O.C.P is disabled the error has also appeared .
Turning off XMP solved this issue. But it is still very weird why XMP needs to be off. Isn't it designed to be stable?
@kernelogic that would imply your overclock is not stable. What processor do you have? AMD CPUs are notoriously picky about memory
@whyrusleeping Hi, we(who report this bug on slack just now) still got this error.
Our architecture is Miner x 1 + p1 worker x 10 + P2C2 worker x 30
Miner and P2C2 worker are the same setups except the miner's RAM is 512GB.
More detail informations for this:
lshw -C memory
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 2.0b
date: 07/26/2017
size: 64KiB
capacity: 15MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-memory:0
description: System Memory
physical id: 2b
slot: System board or motherboard
capabilities: ecc
configuration: errordetection=multi-bit-ecc
*-bank:0
description: DIMM DDR4 Synchronous 2933 MHz (0.3 ns)
product: 36ASF4G72PZ-2G9E2
vendor: Micron
physical id: 0
serial: 26C90311
slot: P1-DIMMA1
size: 32GiB
width: 64 bits
clock: 2933MHz (0.3ns)
*-bank:1
description: DIMM DDR4 Synchronous 2933 MHz (0.3 ns)
product: 36ASF4G72PZ-2G9E2
vendor: Micron
physical id: 1
serial: 26C9031A
slot: P1-DIMMA2
size: 32GiB
width: 64 bits
clock: 2933MHz (0.3ns)
*-bank:2
description: DIMM DDR4 Synchronous 2933 MHz (0.3 ns)
product: 36ASF4G72PZ-2G9E2
vendor: Micron
physical id: 2
serial: 26C8E1E0
slot: P1-DIMMB1
size: 32GiB
width: 64 bits
clock: 2933MHz (0.3ns)
*-bank:3
description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns)
product: M393A4K40BB1-CRC
vendor: Samsung
physical id: 3
serial: 365F159F
slot: P1-DIMMB2
size: 32GiB
width: 64 bits
clock: 2400MHz (0.4ns)
lshw -C cpu
*-cpu:0
description: CPU
product: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
vendor: Intel Corp.
physical id: 57
bus info: cpu@0
version: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
slot: CPU1
size: 1200MHz
capacity: 4GHz
width: 64 bits
clock: 100MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d cpufreq
configuration: cores=16 enabledcores=16 threads=32
sudo dmidecode -t 2
Base Board Information
Manufacturer: Powerleader
Product Name: X10DRG-Q
Version: 1.10
Serial Number: VM17BS018252
Asset Tag: Default string
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
P1 worker's setup
More detail informations for AMD P1 worker:
lshw -C memory
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: V3.00
date: 03/05/2020
size: 64KiB
capacity: 15MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int5printscreen int14serial int17printer acpi usb biosbootspecification uefi
*-memory
description: System Memory
physical id: 1f
slot: System board or motherboard
size: 512GiB
capacity: 2TiB
capabilities: ecc
configuration: errordetection=multi-bit-ecc
*-bank:0
description: DIMM DDR4 Synchronous LRDIMM 2667 MHz (0.4 ns)
product: 72ASS8G72LZ-2G6D2
vendor: Micron
physical id: 0
serial: 1FDA97D9
slot: P0_UMC0_CH_A0
size: 64GiB
width: 64 bits
clock: 2667MHz (0.4ns)
lshw -C cpu
*-cpu
description: CPU
product: AMD EPYC 7262 8-Core Processor
vendor: Advanced Micro Devices [AMD]
physical id: 25
bus info: cpu@0
version: AMD EPYC 7262 8-Core Processor
serial: Unknown
slot: P0
size: 1496MHz
capacity: 3400MHz
width: 64 bits
clock: 100MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca cpufreq
configuration: cores=8 enabledcores=8 threads=16
dmidecode -t 2
Base Board Information
Manufacturer: TYAN
Product Name: S8030GM2NE
Version: empty
Serial Number: CXZE2CK1701G
Asset Tag: empty
Features:
Board is a hosting board
Board is removable
Board is replaceable
Location In Chassis: empty
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
We've tried over 5 times and still got this error frequently. As a result, we have more than 100 machines unable to join the calibration network.
Getting these errors since the new calibration network. Machine was working fine on the old testnet.
2020-07-25T13:40:11.225 INFO filcrypto::proofs::api > verify_seal: start 2020-07-25T13:40:11.230 INFO filecoin_proofs::api::seal > verify_seal:start 2020-07-25T13:40:11.233 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[34359738368]-verifying-key 2020-07-25T13:40:11.233 INFO filecoin_proofs::caches > no params in memory cache for STACKED[34359738368]-verifying-key 2020-07-25T13:40:11.236 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 1073741824; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 11, max_count: 18 }, tree: merkletree-poseidon_hasher-8-8-0 } 2020-07-25T13:40:11.237 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk" exist 2020-07-25T13:40:11.237 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk" for verifying key 2020-07-25T13:40:11.307 INFO storage_proofs_core::parameter_cache > read verifying key from cache "/var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk" 2020-07-25T13:40:11.308 INFO filecoin_proofs::api::seal > got verifying key (34359738368) while verifying seal 2020-07-25T13:40:11.341 INFO filecoin_proofs::api::seal > verify_seal:finish 2020-07-25T13:40:11.341 INFO filcrypto::proofs::api > verify_seal: finish 2020-07-25T13:40:11.352-0700 WARN sectors storage-fsm@v0.0.0-20200720190000-2cfe2fe3c334/checks.go:145 on-chain sealed CID doesn't match! 2020-07-25T13:40:11.353 INFO filcrypto::proofs::api > verify_seal: start 2020-07-25T13:40:11.353 INFO filecoin_proofs::api::seal > verify_seal:start 2020-07-25T13:40:11.353 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[34359738368]-verifying-key 2020-07-25T13:40:11.353 INFO filecoin_proofs::caches > found params in memory cache for STACKED[34359738368]-verifying-key 2020-07-25T13:40:11.353 INFO filecoin_proofs::api::seal > got verifying key (34359738368) while verifying seal 2020-07-25T13:40:11.353 INFO filcrypto::proofs::api > verify_seal: finish 2020-07-25T13:40:11.354-0700 ERROR sectors storage-fsm@v0.0.0-20200720190000-2cfe2fe3c334/fsm.go:26 unhandled sector error (0): checkCommit sanity check error: github.com/filecoin-project/storage-fsm.(*Sealing).handleCommitFailed /home/dev/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200720190000-2cfe2fe3c334/states_failed.go:184