encryptorion-lab / phantom-fhe

PhantomFHE: A CUDA-Accelerated Homomorphic Encryption Library
https://encryptorion-lab.gitbook.io/phantom-fhe/
GNU General Public License v3.0
79 stars 9 forks source link

out of memory cudaMallocAsync(&this->ptr_, obj.n_ * sizeof(T), obj.cudaStream_) #12

Closed gogo9th closed 2 months ago

gogo9th commented 2 months ago

Hi,

I am running a big for loop as follows:

   for (int i=0; i < i; i++){    // num_cts : 53
      for (int j = 0; i < J; j++) {         // code_length: 169
         if (cipher1_list[i][j].data_ptr().get() != 0) {
            cipher3_list[i][j] = multiply_plain(*context, cipher1_list[i][j], cipher2_list[i][j]);
         }
      }
   }

And when the loop processes the 8667-th ciphetext, it crashes with the following error:

CUDA Runtime Error at: /usr/local/include/phantom/cuda_wrapper.cuh:84
out of memory cudaMallocAsync(&this->ptr_, obj.n_ * sizeof(T), obj.cudaStream_)
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDA Runtime Error
Aborted (core dumped)

I suppose this error is due to too many ciphertexts. I wonder if there is any way to avoid this memory overflow.

MrSlavika commented 2 months ago

Hi, From my past experience of using the lib, this runtime error is caused by GPU memory request, i.e., you may not have enough memory/free address as demand.

If you no longer need the ciphertext, please consider release it by calling the .release() method of the Pointer.

Hope it helps.

MrSlavika commented 2 months ago

Since the release method might not be available anymore, there is memory pool threshold setting: source Set line 130 to a value smaller than your memory limit would work.

D4rkCrypto commented 2 months ago

If the ciphertext1 and ciphertext2 matrices are no longer needed after this double loop, consider dividing them into smaller tiles and implicitly freeing memory for each tile. Since we use RAII for memory management, ensure that the memory you wish to free is out of scope.

gogo9th commented 2 months ago

For my application, the required active chunk of memory needed is greater than the limit of the GPU memory. Therefore, it seems that the only solution is to redesign my application to temporarily swap some PhantomCiphertext object's data into regular malloc()ed (non-GPU) area and restore back to the Ciphertext object's memory area when homomorphic computations are needed. This could be another fantastic reason we need a support for .save() and .load() into stringstream objects. In fact, we have an important annual conference demo scheduled next week and I just wonder it could be ever possible for you to add this support either next or next-next week... And one (very beneficial) additional request is that when we save() ciphetexts encrypted by symmetric key, could you make it store only the seed value for the masking polynomial (just like how SEAL implements .save() for symmetric ciphertexts) so that we can halve the size of the ciphertext? Currently, we use this technique to reduce the size of the network-sent ciphertexts from 100M to 50M, so this is a very important element.. Please consider this if you could. Thanks!

D4rkCrypto commented 2 months ago

Sure, I’ll add this feature soon. I should be back in my dev zone by tomorrow.

D4rkCrypto commented 2 months ago

Hello,

I've had the opportunity to implement save/load functions for PhantomCiphertext and PhantomPlaintext today. I'll be pushing this code tonight, so you'll be able to test it out soon.

Please note that due to my current commitments to other projects, the implementation of similar functionality for other classes may be delayed.

Thank you for your patience and understanding. Feel free to reach out if you have any questions or feedback once you've had a chance to review the new code.