AntCPLab / OpenBumbleBee

Apache License 2.0
14 stars 1 forks source link

[Bug]: Encryption parameters error when running mircrobenchmark for MatMul #4

Closed grueyg closed 1 month ago

grueyg commented 1 month ago

Issue Type

Others

Modules Involved

SPU runtime

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

db1e442

OS Platform and Distribution

Linux Ubuntu 18.04

Python Version

3.10

Compiler Version

GCC 11.4

Current Behavior?

运行matmul基准测试的时候,出现ValueError: encryption parameters are not set correctly的错误。

运行gelu和softmax基准测试的时候,一切正常。

由于我刚接触SPU,仅从这一报错信息上没有办法判断是哪里出了问题。下面是完整的报错信息,不知道是否包含足够有用的内容。

Standalone code to reproduce the issue

bazel run -c opt examples/python/microbench:matmul

Relevant log output

(spu) userA@ubuntu:~/project/sf/OpenBumbleBee$ bazel run -c opt examples/python/microbench:matmul
WARNING: /home/userA/project/sf/OpenBumbleBee/libspu/dialect/pphlo/IR/BUILD.bazel:150:15: in cc_library rule //libspu/dialect/pphlo/IR:dialect: Target '//libspu/dialect/pphlo/IR:dialect' violates visibility of target '//libspu/dialect/utils:utils'. Continuing because --nocheck_visibility is active
WARNING: /home/userA/project/sf/OpenBumbleBee/libspu/device/pphlo/BUILD.bazel:19:15: in cc_library rule //libspu/device/pphlo:pphlo_executor: Target '//libspu/device/pphlo:pphlo_executor' violates visibility of target '//libspu/dialect/utils:utils'. Continuing because --nocheck_visibility is active
WARNING: /home/userA/project/sf/OpenBumbleBee/libspu/device/BUILD.bazel:66:15: in cc_library rule //libspu/device:api: Target '//libspu/device:api' violates visibility of target '//libspu/device/utils:debug_dump_constant'. Continuing because --nocheck_visibility is active
WARNING: /home/userA/project/sf/OpenBumbleBee/libspu/device/BUILD.bazel:66:15: in cc_library rule //libspu/device:api: Target '//libspu/device:api' violates visibility of target '//libspu/dialect/utils:utils'. Continuing because --nocheck_visibility is active
INFO: Analyzed target //examples/python/microbench:matmul (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/python/microbench:matmul up-to-date:
  bazel-bin/examples/python/microbench/matmul
INFO: Elapsed time: 0.471s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/python/microbench/matmul
An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
[2024-09-17 20:54:24.960] [info] [thread_pool.cc:30] Create a fixed thread pool with size 47
[2024-09-17 20:54:25.414] [info] [cheetah_dot.cc:310] CheetahDot uses 3@2 modulus 8192 degree for 64 bit ring (packing=enabled)
[2024-09-17 20:54:27.434] [info] [cheetah_dot.cc:475] 16@32x128x256 => 32x128x2 Recv 2.822 MiB, Response 3.881 MiB Pack 1747.707 ms (interleave)
Traceback (most recent call last):
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/examples/python/microbench/matmul.py", line 92, in <module>
    batch_matmul()
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/examples/python/microbench/matmul.py", line 41, in batch_matmul
    z = spu_fn(x, y)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 171, in wrapper
    out_flat = sim(executable, *args_flat)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 119, in __call__
    parties = [job.join() for job in jobs]
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 119, in <listcomp>
    parties = [job.join() for job in jobs]
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 46, in join
    raise self.exc
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 39, in run
    self.ret = self._target(*self._args, **self._kwargs)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 108, in wrapper
    rt.run(executable)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/api.py", line 44, in run
    return self._vm.Run(executable.SerializeToString())
ValueError: encryption parameters are not set correctly
grueyg commented 1 month ago

matmul.py 中的 matmul_with_packlwe() 和 matmul_with_interleave() 都可以正确运行,只有 batch_matmul() 会出现上面的错误

grueyg commented 1 month ago

@fionser 您好。

我在两台不同的机器上执行 examples/microbench/matmul.pybatch_matmul() 的测试,都得到了相同的报错 ValueError: encryption parameters are not set correctly

我尝试开启 enable_action_trace 选项,但似乎并没有提供更多的信息。您能帮我看看是哪里出了问题吗?

[2024-09-18 15:59:33.330] [info] [thread_pool.cc:30] Create a fixed thread pool with size 47
[2024-09-18 15:59:33.389] [TR] [B] hlo.pphlo.dot_general()
[2024-09-18 15:59:33.389] [TR] [B]   hal.batch_matmul(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:33.389] [TR] [B]     hal.i_batch_mmul(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:33.389] [TR] [B]       hal._batch_mmul_ss(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:33.389] [TR] [B]         mpc.batch_mmul_ss(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:33.389] [TR] [B]           mpc.batch_mmul_aa(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:33.778] [info] [cheetah_dot.cc:310] CheetahDot uses 3@2 modulus 8192 degree for 64 bit ring (packing=enabled)
[2024-09-18 15:59:37.795] [info] [cheetah_dot.cc:475] 16@64x128x256 => 64x128x1 Recv 2.822 MiB, Response 7.763 MiB Pack 3535.134 ms (interleave)
[2024-09-18 15:59:37.836] [TR] [E]           mpc.batch_mmul_aa(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:37.837] [TR] [E]         mpc.batch_mmul_ss(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:37.837] [TR] [E]       hal._batch_mmul_ss(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:37.837] [TR] [E]     hal.i_batch_mmul(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:37.837] [TR] [E]   hal.batch_matmul(Value<16x64x128xSI32,s=8192,128,1>, Value<16x128x256xSI32,s=32768,256,1>)
[2024-09-18 15:59:37.837] [TR] [E] hlo.pphlo.dot_general()
Traceback (most recent call last):
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/examples/python/microbench/matmul.py", line 93, in <module>
    batch_matmul()
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/examples/python/microbench/matmul.py", line 42, in batch_matmul
    z = spu_fn(x, y)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 171, in wrapper
    out_flat = sim(executable, *args_flat)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 119, in __call__
    parties = [job.join() for job in jobs]
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 119, in <listcomp>
    parties = [job.join() for job in jobs]
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 46, in join
    raise self.exc
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 39, in run
    self.ret = self._target(*self._args, **self._kwargs)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/utils/simulation.py", line 108, in wrapper
    rt.run(executable)
  File "/home/userA/.cache/bazel/_bazel_userA/12345/execroot/spulib/bazel-out/k8-opt/bin/examples/python/microbench/matmul.runfiles/spulib/spu/api.py", line 44, in run
    return self._vm.Run(executable.SerializeToString())
ValueError: encryption parameters are not set correctly
fionser commented 1 month ago

你是用 OpenBumblebee 的代码跑的吗?这个 error 应该是 SEAL 的密文传输的过程中有哪里没对齐导致的。

fionser commented 1 month ago

checkout https://github.com/AntCPLab/OpenBumbleBee/commit/3228dd08440e81cc80768567873af60ef4b4927d

grueyg commented 1 month ago

checkout 3228dd0

谢谢,现在这个测试没有任何问题了。