Closed Elhorses closed 1 year ago
Could you please provide some logs with log level debug
, or even better trace
(by setting RUST_LOG=trace
)?
Do you have some way to reproduce the problem?
Could you please provide some logs with log level
debug
, or even bettertrace
(by settingRUST_LOG=trace
)?Do you have some way to reproduce the problem?
2022-11-24T23:07:17.607 INFO storage_proofs_core::compound_proof > snark_proof:start
2022-11-24T23:07:17.750 INFO bellperson::groth16::prover > Bellperson 0.22.0 is being used!
2022-11-24T23:07:40.287Z INFO miner miner/miner.go:548 completed mineOne {"tookMilliseconds": 12, "forRound": 2367496, "baseEpoch": 2367495, "baseDeltaSeconds": 10, "nullRounds": 0, "lateStart": false, "beaconEpoch": 2463341, "lookbackEpochs": 900, "networkPowerAtLookback": "21798573920164216832", "minerPowerAtLookback": "11159012229775360", "isEligible": true, "isWinner": false, "error": null}
2022-11-24T23:07:55.019 INFO bellperson::groth16::prover > synthesis time: 37.268746346s
2022-11-24T23:07:55.019 INFO bellperson::groth16::prover > starting proof timer
2022-11-24T23:07:59.294 INFO bellperson::gpu::locks > GPU is available for FFT!
2022-11-24T23:07:59.295 INFO ec_gpu_gen::program > Using kernel on CUDA.
2022-11-24T23:07:59.317 INFO ec_gpu_gen::fft > FFT: 1 working device(s) selected.
2022-11-24T23:07:59.318 INFO ec_gpu_gen::fft > FFT: Device 0: GeForce RTX 3090
2022-11-24T23:07:59.318 INFO bellperson::gpu::locks > GPU FFT kernel instantiated!
2022-11-24T23:08:10.074Z INFO miner miner/miner.go:548 completed mineOne {"tookMilliseconds": 51, "forRound": 2367497, "baseEpoch": 2367495, "baseDeltaSeconds": 40, "nullRounds": 1, "lateStart": false, "beaconEpoch": 2463342, "lookbackEpochs": 900, "networkPowerAtLookback": "21798574221660848128", "minerPowerAtLookback": "11159012229775360", "isEligible": true, "isWinner": false, "error": null}
2022-11-24T23:08:17.474Z ERROR storagemarket_impl impl/provider.go:205 failed to connect index provider host with the full node: failed to call NetProtectAdd on the full node, err: missing permission to invoke 'NetProtectAdd' (need 'admin')
2022-11-24T23:08:27.245 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2022-11-24T23:08:27.245 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2022-11-24T23:08:27.246 INFO ec_gpu_gen::program > Using kernel on CUDA.
2022-11-24T23:08:27.248 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2022-11-24T23:08:27.248 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: GeForce RTX 3090 (Chunk-size: 18061702)
2022-11-24T23:08:27.248 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2022-11-24T23:08:40.246Z INFO miner miner/miner.go:548 completed mineOne {"tookMilliseconds": 17, "forRound": 2367498, "baseEpoch": 2367497, "baseDeltaSeconds": 10, "nullRounds": 0, "lateStart": false, "beaconEpoch": 2463343, "lookbackEpochs": 900, "networkPowerAtLookback": "21798572294495338496", "minerPowerAtLookback": "11159012229775360", "isEligible": true, "isWinner": false, "error": null}
2022-11-24T23:09:10.044Z INFO miner miner/miner.go:590 round winner, will mine new block, for {"height": "2367499"}
2022-11-24T23:09:10.045Z INFO storageminer storage/winning_prover.go:70 Computing WinningPoSt ;[{SealProof:9 SectorNumber:152313 SectorKey:
From the log messages it's hard to tell, which lines comes from which process/thread. It could well be that the WinningPoSt one got priority. Why are you sure it didn't?
Are you able to reproduce the issue? Are you compiling the Rust parts from source? I'm asking as if you can, I might be able to provide you a version, where it also logs the thread ID, so that we can distinguish them.
you can run "cargo test test_parallel_prover --features "cuda" -- --nocapture" with v0.21.0 and v0.22.0, and then compare rust DEBUG log, we find v0.21.0 could get "[2022-11-28T13:26:12Z WARN bellperson::gpu::locks] GPU acquired by a high priority process! Freeing up Multiexp kernels..." if happened conflict, but v0.22.0 never get this log. and for my lotus-miner, When the wdpost calculation and winningpost calculation occur at the same time, although the priority of winningpost is true and that of wdpost is false, winningpost still fails to preempt the GPU,and then winningpost computing timeout.
Thanks @Elhorses for providing the command to run. I think I can reproduce it, I'm having a look.
Thanks @Elhorses for providing the command to run. I think I can reproduce it, I'm having a look.
OK, thank you ! I have solved the problem, you can look at https://github.com/Elhorses/bellperson/tree/v0.22.0, commit: , and now my lotus-miner working fine
Thanks, that'll save me a lot of time!
@Elhorses here's my version of a fix: https://github.com/filecoin-project/bellperson/pull/293. It's for the master branch, but it should be easily applicable to older bellperson
versions as well. The patch I've done for ec-gpu-gen
that is referenced from my PR isn't needed for correctness, it just makes sure the output doesn't contain any messages about panics.
@Elhorses here's my version of a fix: #293. It's for the master branch, but it should be easily applicable to older
bellperson
versions as well. The patch I've done forec-gpu-gen
that is referenced from my PR isn't needed for correctness, it just makes sure the output doesn't contain any messages about panics.
Ok, thank for you help, i'll use it
@Elhorses here's my version of a fix: #293. It's for the master branch, but it should be easily applicable to older
bellperson
versions as well. The patch I've done forec-gpu-gen
that is referenced from my PR isn't needed for correctness, it just makes sure the output doesn't contain any messages about panics.
hello, can we using bellperson on the AMD GPU?
The OpenCL version should run on AMD GPUs. If it doesn't, it's a bug. Please report if you run into problems.
The OpenCL version should run on AMD GPUs. If it doesn't, it's a bug. Please report if you run into problems.
ok, thank for you help!
When the wdpost calculation and winningpost calculation occur at the same time, although the priority of winningpost is true and that of wdpost is false, winningpost still fails to preempt the GPU,and then winningpost computing timeout