Open YefanWu opened 3 weeks ago
It works on my MacBook (the ndss
branch)
INFO: Running command line: bazel-bin/examples/python/microbench/matmul
[2024-08-18 20:38:01.630] [info] [thread_pool.cc:30] Create a fixed thread pool with size 11
[2024-08-18 20:38:01.747] [info] [cheetah_dot.cc:313] CheetahDot uses 3@2 modulus 8192 degree for 64 bit ring (packing=enabled)
[2024-08-18 20:38:01.748] [info] [cheetah_dot.cc:313] CheetahDot uses 3@2 modulus 8192 degree for 64 bit ring (packing=enabled)
[2024-08-18 20:38:02.671] [info] [cheetah_dot.cc:481] 32@64x128x56 => 64x128x1 Recv 5.645 MiB, Response 3.396 MiB Pack 681.563 ms (interleave)
[2024-08-18 20:38:02.678] [info] [cheetah_dot.cc:481] 32@64x128x56 => 64x128x1 Recv 5.645 MiB, Response 3.396 MiB Pack 693.887 ms (interleave)
[2024-08-18 20:38:02.683] [info] [api.cc:170] [Profiling] SPU execution <lambda> completed, input processing took 5.42e-07s, execution took 1.045159958s, output processing took 1.083e-06s, total time 1.045161583s.
[2024-08-18 20:38:02.683] [info] [api.cc:215] HAL profiling: total time 1.044960958
[2024-08-18 20:38:02.683] [info] [api.cc:222] - i_batch_mmul, executed 1 times, duration 1.044960958s, send MiB 27.8925 recv MiB 27.8925
[2024-08-18 20:38:02.683] [info] [api.cc:215] MPC profiling: total time 1.044971291
[2024-08-18 20:38:02.683] [info] [api.cc:222] - batch_mmul_aa, executed 1 times, duration 1.044958333s, send MiB 27.8925 recv MiB 27.8925
[2024-08-18 20:38:02.683] [info] [api.cc:222] - reshape, executed 1 times, duration 1.2958e-05s, send MiB 0 recv MiB 0
[2024-08-18 20:38:02.683] [info] [api.cc:230] Link details: total send MiB 27.8925, recv MiB 27.8925, send actions 50
batch matmul max diff = 0
Thanks for your re-check works.
As u can see, this program just crack as it had been. (I just add some SPDLOG_INFO(...) lines to trace invocations)
[2024-08-18 13:43:59.858] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7
[2024-08-18 13:43:59.859] [info] [io.cc:91] Debugger: experimental_enable_colocated_optimization = true
[2024-08-18 13:43:59.871] [info] [io.cc:91] Debugger: experimental_enable_colocated_optimization = true
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.926] [info] [cheetah_dot.cc:654] Calling BatchDotOLE, dim4 = 16@64@128@256
[2024-08-18 13:43:59.928] [info] [cheetah_dot.cc:689] Calling doBatchDotOLE
[2024-08-18 13:43:59.928] [info] [cheetah_dot.cc:689] Calling doBatchDotOLE
(base) root@5481edaf0e4e:/home/admin/OpenBumbleBee# git branch
* ndss
I will check where it crack on my machine. May be my four-cores machine cause some wired problems indeed (sad :q).
Furthermore, I have a problem about the config line.
config.experimental_enable_colocated_optimization = True
On my machine, this line doesn't make the secure MatMul to be between a private and shared matrix, as well as it works perfect on another more powerful server.
If had understood correctly, the spu-runtime should invoke MatMulAV::proc right? I ensure this understanding from an answer in an issue under spu
I notice that the program also invokes mmul_aa
rather than mmul_av
on your machine. I feel sorry for my confusion and beg for a ground-truth check.
Have a nice day!
The simulation script will not kick in config.experimental_enable_colocated_optimization
. To make this work, we might need to use the nodectl
It also depends on how the input matrix is shared. For instance, this line will create a V
type (i.e., local private type) for P1.
Issue Type
Others
Modules Involved
SPU runtime
Have you reproduced the bug with SPU HEAD?
Yes
Have you searched existing issues?
Yes
SPU Version
spu 0.9.0
OS Platform and Distribution
Linux Ubuntu 22.04
Python Version
3.10
Compiler Version
GCC 11.4
Current Behavior?
program crack and throw back to terminal when execute here. P.S. the function
matmul_with_packlwe()
andmatmul_with_interleave()
works well.Standalone code to reproduce the issue
Relevant log output