Closed ttttonyhe closed 3 months ago
I can't reproduce this bug. You code works on my 30-series laptop. Can you provide more information of you test environment? I think it may be a compatibility bug.
I'm trying to figure out the bug itself. I think it occurs at apply_galois_ntt_permutation
kernel as you said.
I have figured out a way to fix this, please try the latest version.
I can't reproduce this bug. You code works in my 30-series laptop. Can you provide more information of you test environment? I think it may be a compatibility bug.
Unfortunately it's still not working on my end. I'm using an A100 GPU compiled with CMAKE_CUDA_ARCHITECTURES=native
and CUDA version 11.7.
I have reproduced the bug on A100. Stay tuned for the fix.
I have fixed this bug on A100. It seems that it's NVIDIA's issue. Other cards can pass the test, and A100 passes the test in debug mode. Anyway it has been fixed. I also update apply_galois_inplace
and rotate_inplace
.
Works perfectly, thank you!
Hi @D4rkCrypto,
It seems like
apply_galois
(more specificallyapply_galois_ntt
for CKKS ciphertexts) sometimes may not produce the right output when the coefficient modulus chain is short or the poly_modulus_degree is small. Here's a quick demo:Instead of producing
1 1 1 1 1
, it outputs something like7.50071e+29 -5.51248e+27 -3.51746e+28 -3.04196e+28 -1.01832e+27
for me.Using the following combinations of mod chains and $\log{N}$ s will produce the correct output:
{60, 40, 40, 40, 40, 40, 40, 40, 60}
with $\log{N} = 13${60, 40, 40, 60}
with at least $\log{N} = 16$Let me know if you can reproduce this, thank you!