[Bug]: Crack when running mircrobenchmark for MatMul

YefanWu commented 3 weeks ago

Issue Type

Others

Modules Involved

SPU runtime

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu 0.9.0

OS Platform and Distribution

Linux Ubuntu 22.04

Python Version

3.10

Compiler Version

GCC 11.4

Current Behavior?

program crack and throw back to terminal when execute here. P.S. the function matmul_with_packlwe() and matmul_with_interleave() works well.

Standalone code to reproduce the issue

bazel run -c opt examples/python/microbench:matmul

Relevant log output

WARNING: /home/admin/OpenBumbleBee/libspu/dialect/pphlo/IR/BUILD.bazel:150:15: in cc_library rule //libspu/dialect/pphlo/IR:dialect: Target '//libspu/dialect/pphlo/IR:dialect' violates visibility of target '//libspu/dialect/utils:utils'. Continuing because --nocheck_visibility is active
WARNING: /home/admin/OpenBumbleBee/libspu/device/pphlo/BUILD.bazel:19:15: in cc_library rule //libspu/device/pphlo:pphlo_executor: Target '//libspu/device/pphlo:pphlo_executor' violates visibility of target '//libspu/dialect/utils:utils'. Continuing because --nocheck_visibility is active
WARNING: /home/admin/OpenBumbleBee/libspu/device/BUILD.bazel:66:15: in cc_library rule //libspu/device:api: Target '//libspu/device:api' violates visibility of target '//libspu/device/utils:debug_dump_constant'. Continuing because --nocheck_visibility is active
WARNING: /home/admin/OpenBumbleBee/libspu/device/BUILD.bazel:66:15: in cc_library rule //libspu/device:api: Target '//libspu/device:api' violates visibility of target '//libspu/dialect/utils:utils'. Continuing because --nocheck_visibility is active
INFO: Analyzed target //examples/python/microbench:matmul (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/python/microbench:matmul up-to-date:
  bazel-bin/examples/python/microbench/matmul
INFO: Elapsed time: 0.451s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/python/microbench/matmul
(base) root@5481edaf0e4e:/home/admin/OpenBumbleBee# (This is end of what terminal had print)

fionser commented 3 weeks ago

It works on my MacBook (the ndss branch)

INFO: Running command line: bazel-bin/examples/python/microbench/matmul
[2024-08-18 20:38:01.630] [info] [thread_pool.cc:30] Create a fixed thread pool with size 11
[2024-08-18 20:38:01.747] [info] [cheetah_dot.cc:313] CheetahDot uses 3@2 modulus 8192 degree for 64 bit ring (packing=enabled)
[2024-08-18 20:38:01.748] [info] [cheetah_dot.cc:313] CheetahDot uses 3@2 modulus 8192 degree for 64 bit ring (packing=enabled)
[2024-08-18 20:38:02.671] [info] [cheetah_dot.cc:481] 32@64x128x56 => 64x128x1 Recv 5.645 MiB, Response 3.396 MiB Pack 681.563 ms (interleave)
[2024-08-18 20:38:02.678] [info] [cheetah_dot.cc:481] 32@64x128x56 => 64x128x1 Recv 5.645 MiB, Response 3.396 MiB Pack 693.887 ms (interleave)
[2024-08-18 20:38:02.683] [info] [api.cc:170] [Profiling] SPU execution <lambda> completed, input processing took 5.42e-07s, execution took 1.045159958s, output processing took 1.083e-06s, total time 1.045161583s.
[2024-08-18 20:38:02.683] [info] [api.cc:215] HAL profiling: total time 1.044960958
[2024-08-18 20:38:02.683] [info] [api.cc:222] - i_batch_mmul, executed 1 times, duration 1.044960958s, send MiB 27.8925 recv MiB 27.8925
[2024-08-18 20:38:02.683] [info] [api.cc:215] MPC profiling: total time 1.044971291
[2024-08-18 20:38:02.683] [info] [api.cc:222] - batch_mmul_aa, executed 1 times, duration 1.044958333s, send MiB 27.8925 recv MiB 27.8925
[2024-08-18 20:38:02.683] [info] [api.cc:222] - reshape, executed 1 times, duration 1.2958e-05s, send MiB 0 recv MiB 0
[2024-08-18 20:38:02.683] [info] [api.cc:230] Link details: total send MiB 27.8925, recv MiB 27.8925, send actions 50
batch matmul max diff = 0

YefanWu commented 3 weeks ago

Thanks for your re-check works.

As u can see, this program just crack as it had been. (I just add some SPDLOG_INFO(...) lines to trace invocations)

[2024-08-18 13:43:59.858] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7
[2024-08-18 13:43:59.859] [info] [io.cc:91] Debugger: experimental_enable_colocated_optimization = true
[2024-08-18 13:43:59.871] [info] [io.cc:91] Debugger: experimental_enable_colocated_optimization = true
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.928] [info] [cheetah_dot.cc:689] Calling doBatchDotOLE
[2024-08-18 13:43:59.928] [info] [cheetah_dot.cc:689] Calling doBatchDotOLE
(base) root@5481edaf0e4e:/home/admin/OpenBumbleBee# git branch
* ndss

I will check where it crack on my machine. May be my four-cores machine cause some wired problems indeed (sad :q).

Furthermore, I have a problem about the config line.

    config.experimental_enable_colocated_optimization = True

On my machine, this line doesn't make the secure MatMul to be between a private and shared matrix, as well as it works perfect on another more powerful server. If had understood correctly, the spu-runtime should invoke MatMulAV::proc right? I ensure this understanding from an answer in an issue under spu I notice that the program also invokes mmul_aa rather than mmul_av on your machine. I feel sorry for my confusion and beg for a ground-truth check.

Have a nice day!

fionser commented 3 weeks ago

The simulation script will not kick in config.experimental_enable_colocated_optimization. To make this work, we might need to use the nodectl

It also depends on how the input matrix is shared. For instance, this line will create a V type (i.e., local private type) for P1.

AntCPLab / OpenBumbleBee