filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.83k stars 1.25k forks source link

wdpost panic : thread '<unnamed>' panicked #8724

Closed llifezou closed 2 years ago

llifezou commented 2 years ago

Checklist

Lotus component

Lotus Version

lotus-miner version 1.14.0

Describe the Bug

Triggered by some unknown situation, wdpost panics and retries fail. It can be determined that it is not a hardware problem, because we have made scheduling. When post fails on one machine, it can be rescheduled to other machines. This partition is executed by different machines and the result is the same error.

This error cannot be reproduced. It is accidental

Maybe there is a problem with some sector data, it is very strange why it is not recognized: faulty sector: SectorId(xxx)

Logging Information

May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.666 DEBUG merkletree::merkle > generated partial_tree of row_count 4 and len 585 with 8 branches for proof at 35576531
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.667 DEBUG merkletree::merkle > leafs 134217728, branches 8, total size 153391689, total row_count 10, cache_size 299593, rows_to_discard 2, partial_row_count 4, cached_leafs 262144, segment_width 512, segment range 7478272-7478784 for 7478451
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.670 DEBUG merkletree::merkle > generated partial_tree of row_count 4 and len 585 with 8 branches for proof at 31897896
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.671 DEBUG merkletree::merkle > leafs 134217728, branches 8, total size 153391689, total row_count 10, cache_size 299593, rows_to_discard 2, partial_row_count 4, cached_leafs 262144, segment_width 512, segment range 104225280-104225792 for 104225516
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.673 DEBUG merkletree::merkle > generated partial_tree of row_count 4 and len 585 with 8 branches for proof at 79821397
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: thread '<unnamed>' panicked at 'from_repr failure at 4', /home/ubuntu/src/rust-fil-proofs/filecoin-hashers/src/poseidon.rs:351:29
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: stack backtrace:
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    0: std::panicking::begin_panic
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    1: std::panic::panic_any
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    2: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    3: <filecoin_hashers::poseidon::PoseidonFunction as merkletree::hash::Algorithm<filecoin_hashers::poseidon::PoseidonDomain>>::multi_node
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    4: storage_proofs_core::merkle::proof::InclusionPath<H,Arity>::root
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    5: <storage_proofs_core::merkle::proof::MerkleProof<H,Arity,SubTreeArity,TopTreeArity> as storage_proofs_core::merkle::proof::MerkleProofTrait>::verify
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    6: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    7: <rayon::iter::fold::FoldFolder<C,ID,F> as rayon::iter::plumbing::Folder<T>>::consume_iter
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    8: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:    9: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   10: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   11: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   12: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   13: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   14: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   15: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   16: <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   17: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   18: <rayon::iter::fold::FoldFolder<C,ID,F> as rayon::iter::plumbing::Folder<T>>::consume_iter
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   19: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   20: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   21: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   22: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   23: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   24: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   25: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   26: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   27: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   28: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   29: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   30: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   31: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   32: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   33: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   34: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   35: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   36: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   37: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   38: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   39: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   40: rayon_core::registry::WorkerThread::wait_until_cold
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   41: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   42: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   43: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   44: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   45: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   46: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   47: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   48: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   49: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   50: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   51: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   52: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   53: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   54: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   55: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   56: rayon_core::registry::in_worker
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   57: rayon::iter::plumbing::bridge_producer_consumer::helper
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   58: std::panicking::try
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   59: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   60: rayon_core::registry::WorkerThread::wait_until_cold
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]:   61: rayon_core::registry::ThreadBuilder::run
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.676 DEBUG merkletree::merkle > leafs 134217728, branches 8, total size 153391689, total row_count 10, cache_size 299593, rows_to_discard 2, partial_row_count 4, cached_leafs 262144, segment_width 512, segment range 71349760-71350272 for 71350202
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.682 DEBUG merkletree::merkle > generated partial_tree of row_count 4 and len 585 with 8 branches for proof at 49602531
May 25 08:06:21 Hongkong-amd8-94 lotus-wdpost[1163066]: 2022-05-25T08:06:21.684 DEBUG merkletree::merkle > leafs 134217728, branches 8, total size 153391689, total row_count 10, cache_size 299593, rows_to_discard 2, partial_row_count 4, cached_leafs 262144, segment_width 512, segment range 6090752-6091264 for 6091196

Repo Steps

There is no way to reproduce it, it is a low-probability accident, but it has been encountered three times in the past two days and has had to be resolved.

rjan90 commented 2 years ago

It looks like it's a problem with the data. I speculated that it's due to a hardware problem, but it could also be something else. What is the immediate cause of panic, this should be triggered during computation It tries to parse the bytes, but the bytes are not what was expected. Which points towards bad data.

From the comments in https://github.com/filecoin-project/rust-fil-proofs/issues/1604

jennijuju commented 2 years ago

Closing here as the issue is properly raised and being tracked in proof