filecoin-project / rust-fil-proofs

Proofs for Filecoin in Rust
Other
492 stars 314 forks source link

SectorComputeProofFailed: the element is not part of an r-order subgroup #1185

Closed hsienfu closed 4 years ago

hsienfu commented 4 years ago

Describe the problem

computing seal proof failed(2): the element is not part of an r-order subgroup

Hardware:

CPU: AMD Ryzen Threadripper 3970X 32-Core Processor
MEM: 256G 2666MHZ
GPU: GeForce GTX 1080 Ti
SWAP: 256G NVMe
OS: Ubuntu 18.04 server
Disk: 14TiB  logical volume
Make lotus:

env RUSTFLAGS="-C target-cpu=native -g" FFI_BUILD_FROM_SOURCE=1 make clean all

Sectors status

0: PreCommit1 sSet: NO pSet: NO tktH: 0 seedH: 0 deals: [0]

The output of ./lotus-storage-miner sectors status --log <sectorId> for the failed sector(s).

SectorID:       0
Status: CommitFailed
CommD:          6261666b3463687a6161353766377872767975666a676135666b61653667736d6b6a32376e37343434696b3372626e7a71336477687672357075793761
CommR:          6261666b3465687a617261666e6863357a7932657175676237797567686961776c78657866686e356c617765767671336f6a767a6465636b6971753561
Ticket:         dd3b5f627fc160d4d7846a403178b6cb7ac22b7be4b74db7c0541f3d74c3f760
TicketH:                1294
Seed:           c312643f60bbeca2a05abcfaa070e09e22ea60eedb83bd0db0d57a72964b27b6
SeedH:          3055
Proof:
Deals:          [0]
Retries:                1
--------
Event Log:
0.      2020-06-19 23:14:16 +0800 CST:  [event;sealing.SectorStart]     {"User":{"ID":0,"SectorType":3,"Pieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}},"DealInfo":null}]}}
1.      2020-06-19 23:14:16 +0800 CST:  [event;sealing.SectorPacked]    {"User":{"FillerPieces":null}}
2.      2020-06-20 03:54:42 +0800 CST:  [event;sealing.SectorPreCommit1]        {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTA2NTQxLTAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEwNjU0MS0wIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMDY1NDEtMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTA2NTQxLTAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEwNjU0MS0wIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMDY1NDEtMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTA2NTQxLTAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEwNjU0MS0wIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMDY1NDEtMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTA2NTQxLTAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMDY1NDEtMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMDY1NDEtMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"3TtfYn/BYNTXhGpAMXi2y3rCK3vkt023wFQfPXTD92A=","TicketEpoch":1294}}
3.      2020-06-20 05:08:17 +0800 CST:  [event;sealing.SectorPreCommit2]        {"User":{"Sealed":{"/":"bafk4ehzarafnhc5zy2equgb7yughiawlxexfhn5lawevvq3ojvzdeckiqu5a"},"Unsealed":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}}}
4.      2020-06-20 05:08:18 +0800 CST:  [event;sealing.SectorPreCommitted]      {"User":{"Message":{"/":"bafy2bzacebixexqe2vnzb7ohsfvu24fkdwdwhufdyqcw33bygjn75mchfl7c2"}}}
5.      2020-06-20 05:11:16 +0800 CST:  [event;sealing.SectorPreCommitLanded]   {"User":{"TipSet":"AXGg5AIgEesY8sFKJwaKH7GwRrqzCfYBwf3/KHMacQv39uDjYjgBcaDkAiBkbFrDnV79vGIs+4WqpM0QgTONRby0ekns2/9eL8BVvw=="}}
6.      2020-06-20 05:15:26 +0800 CST:  [event;sealing.SectorSeedReady] {"User":{"SeedValue":"wxJkP2C77KKgWrz6oHDgniLqYO7bg70NsNV6cpZLJ7Y=","SeedEpoch":3055}}
7.      2020-06-20 06:29:37 +0800 CST:  [event;sealing.SectorComputeProofFailed]        {"User":{}}
        computing seal proof failed(2): the element is not part of an r-order subgroup
8.      2020-06-20 06:30:37 +0800 CST:  [event;sealing.SectorRetryComputeProof] {"User":{}}
9.      2020-06-20 07:45:37 +0800 CST:  [event;sealing.SectorCommitFailed]      {"User":{}}
        commit check error: invalid proof (compute error?)

image

Version

The output of ./lotus --version.

lotus version 0.4.0+git.ffa7be86
arajasek commented 4 years ago

Thanks for the issue!

We should transfer this to rust-fil-proofs.

moonlight233 commented 4 years ago

The same error CPU: AMD Ryzen Threadripper 3970X 32-Core Processor MEM: 256G 2133MHZ GPU: GeForce RTX 2080 Ti SWAP: 128G NVMe OS: Ubuntu 18.04 server

moonlight233 commented 4 years ago

我碰到了和你一样的错误,能加个微信沟通下么,18221352583 @hsienfu

porcuquine commented 4 years ago

After a lot of debugging over Slack (https://filecoinproject.slack.com/archives/CPFTWMY7N/p1593680700430800) I have concluded that this error probably means that either the groth parameters or verifying key are corrupted. The first thing you should do to check this is rerun the paramfetch program and ensure you have the verified-correct versions of these files. If the problem recurs, take note to see if they are being corrupted and need to be replaced again. It seems that @moonlight233 (for example) is seeing repeated corruption for some unknown reason.

As far as I can tell, this is not an issue with the proofs code but rather with whatever underlying system problem is leading to corruption of these parameters/keys.

hsienfu commented 4 years ago

I have checked proof-parameters files and still commit check error: invalid proof (compute error?) How to verifying key are corrupted ?

lotus fetch-params 32GiB

2020-07-08T16:34:11.889+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk is ok
2020-07-08T16:34:11.889+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk is ok
2020-07-08T16:34:11.889+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk is ok
2020-07-08T16:34:11.889+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk is ok
2020-07-08T16:34:11.890+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk is ok
2020-07-08T16:34:11.891+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk is ok
2020-07-08T16:34:11.891+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk is ok
2020-07-08T16:34:11.911+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk is ok
2020-07-08T16:34:11.911+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk is ok
2020-07-08T16:34:12.713+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.params is ok
2020-07-08T16:35:04.519+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.params is ok
2020-07-08T16:36:03.370+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.params is ok
2020-07-08T16:36:03.370+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:162    parameter and key-fetching complete
SectorID:       0
Status: CommitFailed
CommD:          6261666b3463687a6161353766377872767975666a676135666b61653667736d6b6a32376e37343434696b3372626e7a71336477687672357075793761
CommR:          6261666b3465687a616f32737165776c746f716e34756337686a70706676756775656173646579677167637a6c6a746c657568797a766d796771347971
Ticket:         0d58342afa4e75c3b0ee31c39596605fbb8493bb2965a01d5144eb02737dffeb
TicketH:                63333
Seed:           fbf49b18192f957b40f3c6a527a743eef11e5d4fbda59b14ae35c529a6840efd
SeedH:          65051
Proof:
Deals:          [0]
Retries:                0
--------
Event Log:
0.      2020-07-07 18:27:31 +0800 CST:  [event;sealing.SectorStart]     {"User":{"ID":0,"SectorType":3,"Pieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}},"DealInfo":null}]}}
1.      2020-07-07 18:27:31 +0800 CST:  [event;sealing.SectorPacked]    {"User":{"FillerPieces":null}}
2.      2020-07-07 22:04:04 +0800 CST:  [event;sealing.SectorRestart]   {"User":{}}
3.      2020-07-08 02:27:08 +0800 CST:  [event;sealing.SectorPreCommit1]        {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDQwMi0wIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDQwMi0wIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDQwMi0wIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"DVg0KvpOdcOw7jHDlZZgX7uEk7spZaAdUUTrAnN9/+s=","TicketEpoch":63333}}
4.      2020-07-08 03:40:16 +0800 CST:  [event;sealing.SectorPreCommit2]        {"User":{"Sealed":{"/":"bafk4ehzaiytfpzcmm3ilbyitsg4dkwj4ebaytbbp7bu4gq7lgp77f4kbbq4a"},"Unsealed":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}}}
5.      2020-07-08 03:40:17 +0800 CST:  [event;sealing.SectorPreCommitted]      {"User":{"Message":{"/":"bafy2bzacecrbog7rxi5tlgn7g7lfpedn2cb53sqj47ivkcbefchdxck3kpcsi"}}}
6.      2020-07-08 03:42:56 +0800 CST:  [event;sealing.SectorPreCommitLanded]   {"User":{"TipSet":"AXGg5AIgqlEFIPXkSXOKDWXAHQwZYuQSqeknYe8tFnnJ1CWhwgUBcaDkAiD4c3WbHga+jOXrtO7opzCbgjZ2QjGqsT4rRzSWNho/OQFxoOQCINlod2mnqRF9jCAr3i3yeYHjMw39BkKPtaC6cD4T+JLI"}}
7.      2020-07-08 03:47:06 +0800 CST:  [event;sealing.SectorSeedReady] {"User":{"SeedValue":"+/SbGBkvlXtA88alJ6dD7vEeXU+9pZsUrjXFKaaEDv0=","SeedEpoch":65051}}
8.      2020-07-08 04:58:13 +0800 CST:  [event;sealing.SectorComputeProofFailed]        {"User":{}}
        computing seal proof failed(2): coordinate(s) do not lie on the curve
9.      2020-07-08 04:59:13 +0800 CST:  [event;sealing.SectorRetryComputeProof] {"User":{}}
10.     2020-07-08 06:09:29 +0800 CST:  [event;sealing.SectorComputeProofFailed]        {"User":{}}
        computing seal proof failed(2): coordinate(s) do not lie on the curve
11.     2020-07-08 06:10:29 +0800 CST:  [event;sealing.SectorRetryComputeProof] {"User":{}}
12.     2020-07-08 07:20:33 +0800 CST:  [event;sealing.SectorComputeProofFailed]        {"User":{}}
        computing seal proof failed(2): coordinate(s) do not lie on the curve
13.     2020-07-08 07:21:33 +0800 CST:  [event;sealing.SectorSealPreCommit1Failed]      {"User":{}}
        consecutive compute fails
14.     2020-07-08 07:22:33 +0800 CST:  [event;sealing.SectorRetrySealPreCommit1]       {"User":{}}
15.     2020-07-08 11:46:12 +0800 CST:  [event;sealing.SectorPreCommit1]        {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDQwMi0wIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDQwMi0wIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tbnQvc2RiLy5sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDQwMi0wIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21udC9zZGIvLmxvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwNDAyLTAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvbW50L3NkYi8ubG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA0MDItMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"DVg0KvpOdcOw7jHDlZZgX7uEk7spZaAdUUTrAnN9/+s=","TicketEpoch":63333}}
16.     2020-07-08 12:57:17 +0800 CST:  [event;sealing.SectorPreCommit2]        {"User":{"Sealed":{"/":"bafk4ehzao2sqewltoqn4uc7hjppfvugueasdeygqgczljtleuhyzvmygq4yq"},"Unsealed":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}}}
17.     2020-07-08 12:57:18 +0800 CST:  [event;sealing.SectorPreCommitLanded]   {"User":{"TipSet":"AXGg5AIg5Xza7o+dkPsJNXwbvx8fUgkoLQNA0BAlCno7IWzS/i4BcaDkAiDz9VNI0k+dhdsCjHMzFIpGDj4Ha5wpNZIb4ZtRCtDbyw=="}}
18.     2020-07-08 12:57:18 +0800 CST:  [event;sealing.SectorSeedReady] {"User":{"SeedValue":"+/SbGBkvlXtA88alJ6dD7vEeXU+9pZsUrjXFKaaEDv0=","SeedEpoch":65051}}
19.     2020-07-08 14:08:37 +0800 CST:  [event;sealing.SectorCommitFailed]      {"User":{}}
porcuquine commented 4 years ago

If you run paramfetch, it will check your params and keys and download new ones if needed. I think it will also log what it does so you should be able to tell whether any bad files were detected. If you want to be extra sure, you could copy the current files and compare them later. Or just record digests of your current files. Or check them into a git repo to accomplish both…

moonlight233 commented 4 years ago

I may know the cause of the error. I have a few suspected objects. Let's find out the cause of the error by looking at our common special operation points. @hsienfu

moonlight233 commented 4 years ago
  1. I have upgraded the bios
  2. I changed gurb to change quiet slash to quiet slash nomodeset
  3. I use an array card, the model is 9133-8i
  4. The memory stick and nvme brand I use are asgard Please list your above situation so that I can find out the reason @hsienfu
hsienfu commented 4 years ago

Do you resolve it? @moonlight233

moonlight233 commented 4 years ago

Do you resolve it? @moonlight233

I have a few suspected objects.Please list your above situation so that I can find out the reason

moonlight233 commented 4 years ago

env LOTUS_STORAGE_PATH=/media/filtech/DDD/lotusstorage lotus-storage-miner sectors status --log 0 SectorID: 0 Status: Committing CommD: 6261666b3463687a6161353766377872767975666a676135666b61653667736d6b6a32376e37343434696b3372626e7a71336477687672357075793761 CommR: 6261666b3465687a61776c6c35666b6b343473366765626e6a6e7036657a6e79366d6a67667475797568776f7732376b3264337a74716561776a357961 Ticket: 03c38a9db824d9b76cfde22ea71f40bc36a47ef48907555605d873223030f932 TicketH: 70363 Seed: 7942c18f81e9f493786eee4a9dc6ee01f8e2a6533c01bd697814d1b1b58fdc68 SeedH: 71991 Proof:
Deals: [0] Retries: 2

Event Log:

  1. 2020-07-09 22:53:11 +0800 CST: [event;sealing.SectorStart] {"User":{"ID":0,"SectorType":3,"Pieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}},"DealInfo":null}]}}
  2. 2020-07-09 22:53:11 +0800 CST: [event;sealing.SectorPacked] {"User":{"FillerPieces":null}}
  3. 2020-07-10 02:57:24 +0800 CST: [event;sealing.SectorPreCommit1] {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"A8OKnbgk2bds/eIupx9AvDakfvSJB1VWBdhzIjAw+TI=","TicketEpoch":70363}}
  4. 2020-07-10 03:51:38 +0800 CST: [event;sealing.SectorPreCommit2] {"User":{"Sealed":{"/":"bafk4ehzawll5fkk44s6gebnjnp6ezny6mjgftuyuhwow27k2d3ztqeawj5ya"},"Unsealed":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}}}
  5. 2020-07-10 03:51:38 +0800 CST: [event;sealing.SectorPreCommitted] {"User":{"Message":{"/":"bafy2bzacecixpberibz2gqtapfjee4a4tgo6rcfwvzrj4zlmh5t63illlpouk"}}}
  6. 2020-07-10 03:54:35 +0800 CST: [event;sealing.SectorPreCommitLanded] {"User":{"TipSet":"AXGg5AIgyDV0egub7LRWLCbLcjoIXbXYc5oZ6FtJM5aI7DmwP58BcaDkAiCWr3NYD4HHgNNg2MqioyyLMRhgym8iD3fhsCu2MducuAFxoOQCIEpPdgtdGKufkWzR0WR+WjizjoqJXTYV+6T7kj4r8/ssAXGg5AIgJEXzo0IioDDIfK1MBBBG/1xiguKUBk7posYRlJ3hmOk="}}
  7. 2020-07-10 03:58:45 +0800 CST: [event;sealing.SectorSeedReady] {"User":{"SeedValue":"eULBj4Hp9JN4bu5KncbuAfjiplM8Ab1peBTRsbWP3Gg=","SeedEpoch":71991}}
  8. 2020-07-10 04:44:53 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): the element is not part of an r-order subgroup
  9. 2020-07-10 04:45:53 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  10. 2020-07-10 05:07:24 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): encountered an I/O error: encoding has unexpected information

Caused by: encoding has unexpected information

  1. 2020-07-10 05:08:24 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
moonlight233 commented 4 years ago

2020-07-10T05:19:06.966 INFO bellperson::gpu::locks > GPU is available for FFT! 2020-07-10T05:19:07.071 INFO bellperson::gpu::fft > FFT: 1 working device(s) selected. 2020-07-10T05:19:07.071 INFO bellperson::gpu::fft > FFT: Device 0: GeForce RTX 2080 Ti 2020-07-10T05:19:07.071 INFO bellperson::domain > GPU FFT kernel instantiated! Retrying after this step every time

moonlight233 commented 4 years ago

Hsienfu and I have eliminated the hardware problem through comparison. I also tried the software a dozen times with different methods, all of which reported the same error. This should be your bug. @porcuquine

porcuquine commented 4 years ago

I don't think you can eliminate hardware problems by comparing your hardware, but we can keep investigating the software.

The next step if we are to make progress would be for you to isolate exactly where in the process the corruption occurs. It would also be useful to discover whether it is deterministic. That is, do you end up with the same corrupted contents whenever this happens, or do they vary?

moonlight233 commented 4 years ago

Each error or retrying is after this step, which can determine where in the process the corruption occurs? 2020-07-10T05:19:06.966 INFO bellperson::gpu::locks > GPU is available for FFT! 2020-07-10T05:19:07.071 INFO bellperson::gpu::fft > FFT: 1 working device(s) selected. 2020-07-10T05:19:07.071 INFO bellperson::gpu::fft > FFT: Device 0: GeForce RTX 2080 Ti 2020-07-10T05:19:07.071 INFO bellperson::domain > GPU FFT kernel instantiated! @porcuquine

porcuquine commented 4 years ago

I thought I had written it here, but I guess it was in Slack. Here is what I wrote there:

I guess the next logical step is to determine exactly when this corruption happens. One way I can think of: write a script to continually hash the files and check against the saved (correct) value. If a change is ever detected, log it. Then by inspecting the logs and comparing times, you can hopefully figure out exactly which step causes the corruption. This may still not be fine-grained enough, but it will give you a lot more information than you have now.

moonlight233 commented 4 years ago

I found two places that might cause errors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:203 precommit message landed on chain: 0 2020-07-10T03:56:15.481+0800 WARN sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:236 revert in interactive commit sector step 2020-07-10T03:56:15.962+0800 WARN sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:236 revert in interactive commit sector step 2020-07-10T03:56:18.743+0800 WARN sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:236 revert in interactive commit sector step 2020-07-10T03:56:40.180+0800 WARN sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:236 revert in interactive commit sector step 2020-07-10T03:56:40.310+0800 WARN sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:236 revert in interactive commit sector step 2020-07-10T03:58:45.702+0800 INFO sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_sealing.go:248 scheduling seal proof computation...

moonlight233 commented 4 years ago

2020-07-10T05:23:05.004 INFO bellperson::multiexp > GPU Multiexp kernel instantiated! 2020-07-10T05:25:25.374+0800 INFO rpc go-jsonrpc@v0.1.1-0.20200602181149-522144ab4e24/client.go:204 rpc output message buffer {"n": 2} 2020-07-10T05:25:25.375+0800 INFO rpc go-jsonrpc@v0.1.1-0.20200602181149-522144ab4e24/client.go:204 rpc output message buffer {"n": 2} thread '' panicked at 'slice index starts at 24595123628 but ends at 24595123596', src/libcore/slice/mod.rs:2731:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace # Here!!!!!!!!!!! 2020-07-10T05:31:38.788+0800 INFO sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/states_failed.go:19 ComputeProofFailed(0), waiting 59.211471427s before retrying 2020-07-10T05:32:18.126+0800 INFO dht/RtRefreshManager rtrefresh/rt_refresh_manager.go:265 starting refreshing cpl 0 with key  ~� (routing table size was 0)

moonlight233 commented 4 years ago

env LOTUS_STORAGE_PATH=/media/filtech/DDD/lotusstorage lotus-storage-miner sectors status --log 0 SectorID: 0 Status: PreCommit1 CommD: 6261666b3463687a6161353766377872767975666a676135666b61653667736d6b6a32376e37343434696b3372626e7a71336477687672357075793761 CommR: 6261666b3465687a6136676a3334796c793333776a6d74786a6c71786f3673346f363778786b76743776767a6476343563646a6d6669766f3769356b71 Ticket: 03c38a9db824d9b76cfde22ea71f40bc36a47ef48907555605d873223030f932 TicketH: 70363 Seed: 7942c18f81e9f493786eee4a9dc6ee01f8e2a6533c01bd697814d1b1b58fdc68 SeedH: 71991 Proof:
Deals: [0] Retries: 0

Event Log:

  1. 2020-07-09 22:53:11 +0800 CST: [event;sealing.SectorStart] {"User":{"ID":0,"SectorType":3,"Pieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}},"DealInfo":null}]}}
  2. 2020-07-09 22:53:11 +0800 CST: [event;sealing.SectorPacked] {"User":{"FillerPieces":null}}
  3. 2020-07-10 02:57:24 +0800 CST: [event;sealing.SectorPreCommit1] {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"A8OKnbgk2bds/eIupx9AvDakfvSJB1VWBdhzIjAw+TI=","TicketEpoch":70363}}
  4. 2020-07-10 03:51:38 +0800 CST: [event;sealing.SectorPreCommit2] {"User":{"Sealed":{"/":"bafk4ehzawll5fkk44s6gebnjnp6ezny6mjgftuyuhwow27k2d3ztqeawj5ya"},"Unsealed":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}}}
  5. 2020-07-10 03:51:38 +0800 CST: [event;sealing.SectorPreCommitted] {"User":{"Message":{"/":"bafy2bzacecixpberibz2gqtapfjee4a4tgo6rcfwvzrj4zlmh5t63illlpouk"}}}
  6. 2020-07-10 03:54:35 +0800 CST: [event;sealing.SectorPreCommitLanded] {"User":{"TipSet":"AXGg5AIgyDV0egub7LRWLCbLcjoIXbXYc5oZ6FtJM5aI7DmwP58BcaDkAiCWr3NYD4HHgNNg2MqioyyLMRhgym8iD3fhsCu2MducuAFxoOQCIEpPdgtdGKufkWzR0WR+WjizjoqJXTYV+6T7kj4r8/ssAXGg5AIgJEXzo0IioDDIfK1MBBBG/1xiguKUBk7posYRlJ3hmOk="}}
  7. 2020-07-10 03:58:45 +0800 CST: [event;sealing.SectorSeedReady] {"User":{"SeedValue":"eULBj4Hp9JN4bu5KncbuAfjiplM8Ab1peBTRsbWP3Gg=","SeedEpoch":71991}}
  8. 2020-07-10 04:44:53 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): the element is not part of an r-order subgroup
  9. 2020-07-10 04:45:53 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  10. 2020-07-10 05:07:24 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): encountered an I/O error: encoding has unexpected information

Caused by: encoding has unexpected information

  1. 2020-07-10 05:08:24 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  2. 2020-07-10 05:31:38 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): Rust panic: no unwind information
  3. 2020-07-10 05:32:38 +0800 CST: [event;sealing.SectorSealPreCommit1Failed] {"User":{}} consecutive compute fails
  4. 2020-07-10 05:33:38 +0800 CST: [event;sealing.SectorRetrySealPreCommit1] {"User":{}}
  5. 2020-07-10 09:39:57 +0800 CST: [event;sealing.SectorPreCommit1] {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9tZWRpYS9maWx0ZWNoL0RERC9sb3R1c3N0b3JhZ2UvY2FjaGUvcy10MDEyMDg3Ny0wIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL21lZGlhL2ZpbHRlY2gvREREL2xvdHVzc3RvcmFnZS9jYWNoZS9zLXQwMTIwODc3LTAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvbWVkaWEvZmlsdGVjaC9EREQvbG90dXNzdG9yYWdlL2NhY2hlL3MtdDAxMjA4NzctMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"A8OKnbgk2bds/eIupx9AvDakfvSJB1VWBdhzIjAw+TI=","TicketEpoch":70363}}
  6. 2020-07-10 10:34:34 +0800 CST: [event;sealing.SectorPreCommit2] {"User":{"Sealed":{"/":"bafk4ehza6gj34yly33wjmtxjlqxo6s4o67xxkvt7vvzdv45cdjmfivo7i5kq"},"Unsealed":{"/":"bafk4chzaa57f7xrvyufjga5fkae6gsmkj27n7444ik3rbnzq3dwhvr5puy7a"}}}
  7. 2020-07-10 10:34:34 +0800 CST: [event;sealing.SectorPreCommitLanded] {"User":{"TipSet":"AXGg5AIg0Zd8rIxqXBHvyVG38ibd+4pKQXLMes/HPA2LAAmLEnkBcaDkAiAPXO8Xxv1+YQCyPprpFum9hgI2fUq5m0zTZW3FX68eFQFxoOQCILbf3NfBeORsWtgzoRxrcoXxZQdC0a6wkD26iBMsEcaS"}}
  8. 2020-07-10 10:34:34 +0800 CST: [event;sealing.SectorSeedReady] {"User":{"SeedValue":"eULBj4Hp9JN4bu5KncbuAfjiplM8Ab1peBTRsbWP3Gg=","SeedEpoch":71991}}
  9. 2020-07-10 10:58:07 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): Rust panic: no unwind information
  10. 2020-07-10 10:59:07 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  11. 2020-07-10 11:20:43 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): Rust panic: no unwind information
  12. 2020-07-10 11:21:43 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  13. 2020-07-10 11:43:08 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}} computing seal proof failed(2): Rust panic: no unwind information
  14. 2020-07-10 11:44:08 +0800 CST: [event;sealing.SectorSealPreCommit1Failed] {"User":{}} consecutive compute fails
  15. 2020-07-10 11:45:08 +0800 CST: [event;sealing.SectorRetrySealPreCommit1] {"User":{}}
moonlight233 commented 4 years ago

@porcuquine

hsienfu commented 4 years ago

I find out the "the element is not part of an r-order subgroup" at https://github.com/zkcrypto/group/blob/master/src/lib.rs:171. Is it invoked with rust-ffi-proofs?

moonlight233 commented 4 years ago

2020-07-13T09:15:30.982 INFO filecoin_proofs::api > generate_piece_commitment:start 2020-07-13T09:15:31.041 INFO filecoin_proofs::api > generate_piece_commitment:finish 2020-07-13T09:15:31.046 INFO filecoin_proofs::api > generate_piece_commitment:start 2020-07-13T09:15:31.094+0800 INFO miner miner/miner.go:304 Time delta between now and our mining base: 6s (nulls: 0) 2020-07-13T09:15:31.107 INFO filecoin_proofs::api > generate_piece_commitment:finish 2020-07-13T09:15:31.114 INFO filcrypto::proofs::api > generate_data_commitment: start 2020-07-13T09:15:31.114 INFO filecoin_proofs::api::seal > compute_comm_d:start 2020-07-13T09:15:31.114 INFO filecoin_proofs::pieces > verifying 8192 pieces 2020-07-13T09:15:31.115 INFO filecoin_proofs::api::seal > compute_comm_d:finish 2020-07-13T09:15:31.115 INFO filcrypto::proofs::api > generate_data_commitment: finish 2020-07-13T09:15:31.115+0800 INFO sectors storage-fsm@v0.0.0-20200707194229-bc5e298e2b4c/sealing.go:240 Creating CC sector 2 2020-07-13T09:15:33.376+0800 INFO sectors storage-fsm@v0.0.0-20200707194229-bc5e298e2b4c/states_sealing.go:21 performing filling up rest of the sector... {"sector": "2"} 2020-07-13T09:15:33.402+0800 ERROR sectors storage-fsm@v0.0.0-20200707194229-bc5e298e2b4c/fsm.go:26 unhandled sector error (2): checkPieces sanity check error: github.com/filecoin-project/storage-fsm.(*Sealing).handlePreCommit1 /home/filtech/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200707194229-bc5e298e2b4c/states_sealing.go:92

env LOTUS_STORAGE_PATH=/media/filtech/DDD/lotusstorage lotus-storage-miner info Miner: t01152 Sector Size: 32 GiB Byte Power: 32 GiB / 2.952 TiB (1.0587%) Actual Power: 32 Gi / 3.02 Ti (1.0360%) Committed: 32 GiB Proving: 32 GiB Expected block win rate: 179.0208/day (every 8m2s)

Miner Balance: 15006.106409068606536657 PreCommit: 7331.75321792595719542 Locked: 7454.399919416314163392 Available: 219.953271726335177845 Worker Balance: 10239.252589836194784327 Market (Escrow): 0 Market (Locked): 0

Sectors: Total: 3 Proving: 1 PreCommit1: 1 FailedUnrecoverable: 1

moonlight233 commented 4 years ago

On the butterfly, the first sector was successfully sealed, and the second and third sectors failed. The miner has not stopped without adding other parameters. Why is this happening? @porcuquine

moonlight233 commented 4 years ago

32GB 2020-07-07T00:43:11.983 INFO filecoin_proofs::caches > found params in memory cache for STACKED[34359738368]-verifying-key 2020-07-07T00:43:11.983 INFO filecoin_proofs::api::seal > got verifying key (34359738368) while verifying seal 2020-07-07T00:43:11.984 INFO filcrypto::proofs::api > verify_seal: finish 2020-07-07T00:43:11.985+0800    ERROR  sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/fsm.go:26     unhandled sector error (0): checkCommit sanity check error:   github.com/filecoin-project/storage-fsm.(Sealing).handleCommitFailed        /home/filtech/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200625160832-379a4655b044/states_failed.go:184  - verify seal:    github.com/filecoin-project/storage-fsm.(Sealing).checkCommit        /home/filtech/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200625160832-379a4655b044/checks.go:158  - failed to fill whole buffer     512MB 2020-07-15T08:08:08.490 INFO filecoin_proofs::api::seal > snark_proof:finish 2020-07-15T08:08:08.490 INFO filecoin_proofs::api::seal > verify_seal:start 2020-07-15T08:08:08.490 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key 2020-07-15T08:08:08.490 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]-verifying-key 2020-07-15T08:08:08.490 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 } 2020-07-15T08:08:08.490 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/media/filtech/CCC/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" exist 2020-07-15T08:08:08.490 INFO storage_proofs_core::parameter_cache > checking cache_path: "/media/filtech/CCC/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" for verifying key 2020-07-15T08:08:08.505 INFO storage_proofs_core::parameter_cache > read verifying key from cache "/media/filtech/CCC/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" 2020-07-15T08:08:08.505 INFO filecoin_proofs::api::seal > got verifying key (536870912) while verifying seal 2020-07-15T08:08:08.517 INFO filecoin_proofs::api::seal > verify_seal:finish 2020-07-15T08:08:08.517 INFO filecoin_proofs::api::seal > seal_commit_phase2:finish 2020-07-15T08:08:08.517 INFO filcrypto::proofs::api > seal_commit_phase2: finish 2020-07-15T08:08:08.572 INFO filcrypto::proofs::api > verify_seal: start 2020-07-15T08:08:08.572 INFO filecoin_proofs::api::seal > verify_seal:start 2020-07-15T08:08:08.572 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key 2020-07-15T08:08:08.572 INFO filecoin_proofs::caches > found params in memory cache for STACKED[536870912]-verifying-key 2020-07-15T08:08:08.572 INFO filecoin_proofs::api::seal > got verifying key (536870912) while verifying seal 2020-07-15T08:08:08.587 INFO filecoin_proofs::api::seal > verify_seal:finish 2020-07-15T08:08:08.587 INFO filcrypto::proofs::api > verify_seal: finish

porcuquine commented 4 years ago

I find out the "the element is not part of an r-order subgroup" at https://github.com/zkcrypto/group/blob/master/src/lib.rs:171. Is it invoked with rust-ffi-proofs?

Yes, that looks like the source.

moonlight233 commented 4 years ago

512MB succeeds, 32GB fails, can you help me find out the reason through comparison

moonlight233 commented 4 years ago

There are already four Chinese people who have encountered the same problem, and some of them have happened on Intel. We have formed a group and have been studying how to solve it. Please help us. @porcuquine

porcuquine commented 4 years ago

On the butterfly, the first sector was successfully sealed, and the second and third sectors failed. The miner has not stopped without adding other parameters. Why is this happening? @porcuquine

I don't understand what you mean by 'butterfly' here.

moonlight233 commented 4 years ago

Butterfly is branch ntwk-butterfly. https://stats.butterfly.fildev.network/d/z6FtI92Zz/chain?orgId=1&refresh=25s&from=now-30m&to=now&kiosk

porcuquine commented 4 years ago

32GB 2020-07-07T00:43:11.983 INFO filecoin_proofs::caches > found params in memory cache for STACKED[34359738368]-verifying-key 2020-07-07T00:43:11.983 INFO filecoin_proofs::api::seal > got verifying key (34359738368) while verifying seal 2020-07-07T00:43:11.984 INFO filcrypto::proofs::api > verify_seal: finish 2020-07-07T00:43:11.985+0800    ERROR  sectors storage-fsm@v0.0.0-20200625160832-379a4655b044/fsm.go:26     unhandled sector error (0): checkCommit sanity check error:   github.com/filecoin-project/storage-fsm.(Sealing).handleCommitFailed        /home/filtech/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200625160832-379a4655b044/states_failed.go:184  - verify seal:    github.com/filecoin-project/storage-fsm.(Sealing).checkCommit        /home/filtech/go/pkg/mod/github.com/filecoin-project/storage-fsm@v0.0.0-20200625160832-379a4655b044/checks.go:158  - failed to fill whole buffer

Well, this seems to be the same corruption as before.

I need you to:

  1. Isolate exactly what point in the process the corruption happens. Please try to do this by running a separate process to continually hash the file which becomes corrupted and see at what point the value changes.
  2. Discover whether this failure is deterministic or not. Is it always corrupted in the exact same way or not?
  3. If not, does it ever succeed?
  4. Does the file change in size, or do its contents just change?
  5. Identify which file is being corrupted, the verifying key or the groth params (or both).
moonlight233 commented 4 years ago

on the master and on the butterfly. I tested more than 20 times 32gb. Only one time was successful, after that one sector succeeded, the following sectors failed, miner did not stop at that time, the environment has not changed.

moonlight233 commented 4 years ago

Just like the logs above, both P1 and P2 are successful, and the failure is C2 or verify. We both use GPU, which may be GPU. This is the smallest range we can use our equipment to detect, you know better than we , Please help us to check the reason based on this information

moonlight233 commented 4 years ago

After each failure, using the pledge sectors will directly report an error. It can only be cleared and applied again after a new miner. It takes several hours to complete the operation each time. Starting from 6.18, I have to test for more than 12 hours every day.

moonlight233 commented 4 years ago

32GB is missing this log. 2020-07-15T08:08:08.490 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key 2020-07-15T08:08:08.490 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]-verifying-key 2020-07-15T08:08:08.490 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 } 2020-07-15T08:08:08.490 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/media/filtech/CCC/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" exist 2020-07-15T08:08:08.490 INFO storage_proofs_core::parameter_cache > checking cache_path: "/media/filtech/CCC/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" for verifying key 2020-07-15T08:08:08.505 INFO storage_proofs_core::parameter_cache > read verifying key from cache "/media/filtech/CCC/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" @porcuquine

porcuquine commented 4 years ago

Just like the logs above, both P1 and P2 are successful, and the failure is C2 or verify. We both use GPU, which may be GPU. This is the smallest range we can use our equipment to detect, you know better than we , Please help us to check the reason based on this information

Okay, please try without GPU but with correct parameters fetched. Let’s see if we can rule GPU in or out as the problem.

moonlight233 commented 4 years ago

2020-07-15T19:05:25.735 INFO bellperson::gpu::locks > GPU is available for FFT! 2020-07-15T19:05:25.843 INFO bellperson::gpu::fft > FFT: 1 working device(s) selected. 2020-07-15T19:05:25.843 INFO bellperson::gpu::fft > FFT: Device 0: GeForce RTX 2080 Ti 2020-07-15T19:05:25.843 INFO bellperson::domain > GPU FFT kernel instantiated! thread '' panicked at 'slice index starts at 11941486238 but ends at 11941486184', src/libcore/slice/mod.rs:2731:5 2020-07-15T19:09:20.004+0800 INFO sectors storage-fsm@v0.0.0-20200707194229-bc5e298e2b4c/states_failed.go:19 ComputeProofFailed(0), waiting 58.995059898s before retrying

moonlight233 commented 4 years ago

hsienfu is testing with no gpu and i tested with gpu again. This time I got a new clue, the error of slice.what caused the slice error? @porcuquine

hsienfu commented 4 years ago

Run ./bench sealing --sector-size 32GiB --no-gpu --storage-dir path/to/.lotus-bench

2020-07-15T10:33:52.203+0800    INFO    lotus-bench lotus-bench/main.go:75  Starting lotus-bench
2020-07-15T10:33:52.204+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk is ok
2020-07-15T10:33:52.205+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk is ok
2020-07-15T10:33:52.205+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk is ok
2020-07-15T10:33:52.206+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk is ok
2020-07-15T10:33:52.207+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk is ok
2020-07-15T10:33:52.206+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk is ok
2020-07-15T10:33:52.207+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk is ok
2020-07-15T10:33:52.206+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk is ok
2020-07-15T10:33:52.206+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk is ok
2020-07-15T10:33:52.207+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk is ok
2020-07-15T10:33:52.207+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk is ok
2020-07-15T10:33:52.206+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk is ok
2020-07-15T10:33:52.207+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk is ok
2020-07-15T10:33:52.211+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk is ok
2020-07-15T10:33:52.212+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk is ok
2020-07-15T10:33:52.432+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.params is ok
2020-07-15T10:34:44.309+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.params is ok
2020-07-15T10:34:58.735+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:138    Parameter file /var/tmp/filecoin-proof-parameters/v27-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.params is ok
2020-07-15T10:34:58.735+0800    INFO    build   go-paramfetch@v0.0.2-0.20200701152213-3e0f0afdc261/paramfetch.go:162    parameter and key-fetching complete
2020-07-15T10:34:58.735+0800    INFO    lotus-bench lotus-bench/main.go:484 [1] Writing piece into sector...
2020-07-15T10:34:58.984 INFO filecoin_proofs::api > generate_piece_commitment:start
2020-07-15T10:34:59.171 INFO filecoin_proofs::api > generate_piece_commitment:finish
2020-07-15T10:34:59.185 INFO filecoin_proofs::api > generate_piece_commitment:start

....

2020-07-15T20:27:37.607 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2020-07-15T20:27:37.608 WARN bellperson::multiexp > Cannot instantiate GPU Multiexp kernel! Error: GPUError: No working GPUs found!
2020-07-15T20:28:17.495 INFO filcrypto::proofs::api > seal_commit_phase2: finish
2020-07-15T20:28:30.925+0800    WARN    lotus-bench     lotus-bench/main.go:91  failed to run seals:
    main.glob..func3
        /usr/local/services/lotus/cmd/lotus-bench/main.go:249
  - coordinate(s) do not lie on the curve
    github.com/filecoin-project/filecoin-ffi.SealCommitPhase2
        /usr/local/services/lotus/extern/filecoin-ffi/proofs.go:382
    github.com/filecoin-project/sector-storage/ffiwrapper.(*Sealer).SealCommit2
        /root/go/pkg/mod/github.com/filecoin-project/sector-storage@v0.0.0-20200630180318-4c1968f62a8f/ffiwrapper/sealer_cgo.go:500

@porcuquine

porcuquine commented 4 years ago

hsienfu is testing with no gpu and i tested with gpu again. This time I got a new clue, the error of slice.what caused the slice error? @porcuquine

These tests seem to show the problem is not GPU-related (assuming you had fresh, uncorrupted parameters/keys before you started).

The slice error seems to be the same kind of thing you've been seeing before — trying to read data that's not long enough. The simplest explanation (not saying it is this), would be that your parameter files have been truncated. I am still waiting for more detailed about the timing and nature of the corruption you are experiencing. Without that, I don't think I'll be able to make useful guesses.

I wrote the things I think you need to do and collect a few messages up.

porcuquine commented 4 years ago

On further consideration, the slice error looks weird. These logs give very little information about what's actually happening, though.

porcuquine commented 4 years ago

My best current guess is that something weird may be happening with the mmapping of groth parameters. I talked to @cryptonemo about it briefly and am hoping he can investigate a little. Maybe he will have some ideas or discover something.

I still think the information I've requested will be helpful. It's pretty likely that without more information about the details of failure (exactly when, in what way, with what consistency, and which files are corrupted), we won't be able to narrow this down enough.

moonlight233 commented 4 years ago

This type of error all occurs on a single machine There are Intel cpu and AMD cpu The error has nothing to do with Gpu Intel only reports this kind of error with cpu, only a small amount, unlike amd almost always report an error Proof has nothing to do with ubuntu version 512MB is all successful, 32GB and 64GB are almost failed I tried more than 30 times with amd, only one time successfully sealed the 32gb sector, but the subsequent sectors failed, when the miner did not stop, the environmental parameters have not changed @porcuquine

porcuquine commented 4 years ago

This is what I'm looking for: https://github.com/filecoin-project/rust-fil-proofs/issues/1185#issuecomment-658482719

moonlight233 commented 4 years ago
  1. Where can I find hash the file which becomes corrupted Is it in unseal?
  2. Most cases are damaged in the same form, commit check error: invalid proof (compute error?)
  3. Using gpu will occasionally succeed, almost once more than 30 times
  4. The V27 file does not appear to be damaged, because it can be used next time, and no mismatch appears
porcuquine commented 4 years ago
  1. Where can I find hash the file which becomes corrupted Is it in unseal?

You told me before that either your groth params or your verifying key is becoming corrupted and has to be fetched again. That is the file I want you to hash in order to find out exactly when this happening. Use a CLI digest program like md5sum or sha1sum, etc.

The V27 file does not appear to be damaged, because it can be used next time, and no mismatch appears

If none of the files are corrupted, then this line of inquiry is wrong, but I'm pretty sure that's not what we concluded previously (either here or on Slack, I can't keep track).

Please confirm that the verifying key is never corrupted. If it is then please try to discover exactly when and how.

moonlight233 commented 4 years ago
  1. Where can I find hash the file which becomes corrupted Is it in unseal?

You told me before that either your groth params or your verifying key is becoming corrupted and has to be fetched again. That is the file I want you to hash in order to find out exactly when this happening. Use a CLI digest program like md5sum or sha1sum, etc. Initially, there was a response to parameter mismatch, but with our later tests, mismatch no longer appears

The V27 file does not appear to be damaged, because it can be used next time, and no mismatch appears

If none of the files are corrupted, then this line of inquiry is wrong, but I'm pretty sure that's not what we concluded previously (either here or on Slack, I can't keep track).

Please confirm that the verifying key is never corrupted. If it is then please try to discover exactly when and how. Later, vk never crashed, and can be reused by the new miner

moonlight233 commented 4 years ago

All single machines will start rrefeshing this warning after lotus dameon is started, which will interrupt the download of the proof parameters. Although the download will be successful, it proves that the parameters are cut off The reason is that the commit stage is a zero-knowledge file generated by the V27 proof file after being hashed, and then verified Zero-knowledge proof only takes a piece of document. The success we took was exactly the time that we got the complete and uninterrupted document In most cases, the interrupted file is obtained, so that it can be explained This is our guess, please give us guidance

moonlight233 commented 4 years ago

Or the zero-knowledge proof that the core error report actually generates is an error or the verification error of the zero-knowledge proof

porcuquine commented 4 years ago

I'm sorry. I am having a very hard time understanding your sentences. This is probably a language issue, and I know it's not your fault.

What file is 'interrupted'?

If you do not believe your groth params or verifying key are corrupted, then you shouldn't need to download anything.

If something is being corrupted, that is what I am trying to understand more about.

If this is happening and you are able to get the correct files once, you should just make a local copy so you don't have to keep downloading them again. This is especially true if you think the download process may be failing in some way and causing a problem. Of course you need to make sure the 'good copy' you have really is good before relying on it repeatedly though.

moonlight233 commented 4 years ago

I'm sorry. I am having a very hard time understanding your sentences. This is probably a language issue, and I know it's not your fault.

What file is 'interrupted'?

If you do not believe your groth params or verifying key are corrupted, then you shouldn't need to download anything.

If something is being corrupted, that is what I am trying to understand more about.

If this is happening and you are able to get the correct files once, you should just make a local copy so you don't have to keep downloading them again. This is especially true if you think the download process may be failing in some way and causing a problem. Of course you need to make sure the 'good copy' you have really is good before relying on it repeatedly though. When starting the miner download V27 download, it was interrupted by WARN dht/RtRefreshManager rtrefresh/rt_refresh_manager.go: 191 failed when refreshing routing tab and then automatically restored