Closed cryptonemo closed 9 months ago
This sounds like https://github.com/supranational/supra_seal/issues/32, can you please check kernel and GCC versions (and possibly try other ones)?
For me this works:
$ RUST_LOG=trace cargo test --release --features cuda-supraseal --test api test_seal_lifecycle_upgrade_2kib_base_8 -- --ignored
…
test test_seal_lifecycle_upgrade_2kib_base_8 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 31 filtered out; finished in 11.33s
I'm on a machine (worker-gpu-7
) with:
$ uname -a
Linux worker-gpu-7 5.4.0-94-generic #106-Ubuntu SMP Thu Jan 6 23:58:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
Also works flawless on miner-2
:
$ uname -a
Linux miner-2 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
I think we've attributed that to an environment issue, hence I close this issue. If it still needs more attention, feel free to re-open this issue.
Description
Seeing failures on local hardware during tests when
cuda-supraseal
is enabled:Example failure from
miner-1
Separate failure also on
miner-1
(note that a previous failure spotted passes in this run -- which means that this could be a hardware issue)Example failure from local machine:
Note that on both machines using
cuda
works 100% every time (I cannot get it to fail, even with repeated runs).Acceptance criteria
Risks + pitfalls
Where to begin