facebookresearch / CrypTen

A framework for Privacy Preserving Machine Learning
MIT License
1.53k stars 279 forks source link

Saving a Tensor to a file and loading it changes the value of the Tensor. #442

Open jtovartr opened 1 year ago

jtovartr commented 1 year ago

I have the following code whose function is to generate a Tensor, encrypt it, save it in a file and then read it, decrypt it and compare the values obtained.

import torch
import crypten
import crypten.mpc as mpc
import crypten.communicator as comm
import time
import sys
import torchvision

crypten.init()
torch.set_num_threads(1)

print("Data:")
alice_data = torch.tensor([1, 2, 3.0])
print(alice_data)

@mpc.run_multiprocess(world_size=2)
def save_all_data():
    alice_data_enc = crypten.cryptensor(alice_data)
    crypten.print("\nEncrypted Tensor before save data:")
    crypten.print(alice_data_enc)
    crypten.save(alice_data_enc, "/home/jesus/Escritorio/alice_data.txt")

save_all_data()

@mpc.run_multiprocess(world_size=2)
def load_data():
    alice_data_enc2 = crypten.load("/home/jesus/Escritorio/alice_data.txt")
    crypten.print("\nEncrypted Tensor after load data:")
    crypten.print(alice_data_enc2)
    crypten.print("\nFinal data:")
    crypten.print(alice_data_enc2.get_plain_text())

load_data()

As can be seen, the content of the encrypted tensor stored in the file is retrieved correctly, but when displaying the information as plain text, it does not match the tensor initially created.

Data:
tensor([1., 2., 3.])

Encrypted Tensor before save data:
MPCTensor(
        _tensor=tensor([1745663545320523213,  871792230649247636, 7816814413220844221])
        plain_text=HIDDEN
        ptype=ptype.arithmetic
)

Encrypted Tensor after load data:
MPCTensor(
        _tensor=tensor([1745663545320523213,  871792230649247636, 7816814413220844221])
        plain_text=HIDDEN
        ptype=ptype.arithmetic
)

Final data:
tensor([ 5.3273e+13,  2.6605e+13, -4.2925e+13])

I need help to know why the initial and final values do not match.

mohammad-alrubaie commented 1 year ago

It seems you are using the same file to save both shares, essentially overwriting the file, or causing an error when writing. I modified the code your gave to the following and it worked (removing the first part to focus on the changes):

@mpc.run_multiprocess(world_size=2)
def save_all_data():
    rank = comm.get().get_rank()
    alice_data_enc = crypten.cryptensor(alice_data)
    crypten.print("\nEncrypted Tensor before save data:")
    crypten.print(f"\nRank {rank}:\n {alice_data_enc}\n", in_order=True)
    crypten.save(alice_data_enc, f"alice_data_{rank}.enc")

save_all_data()

and to load the data

@mpc.run_multiprocess(world_size=2)
def load_data():
    rank = comm.get().get_rank()
    alice_data_enc2 = crypten.load(f"alice_data_{rank}.enc")
    crypten.print("\nEncrypted Tensor after load data:")
    crypten.print(f"\nRank {rank}:\n {alice_data_enc2}\n", in_order=True)
    crypten.print("\nFinal data:")
    alice_data_dec = alice_data_enc2.get_plain_text()
    crypten.print(f"  plaintext: {alice_data_dec}\n")

load_data()

The main difference is using rank = comm.get().get_rank() which fetches the rank to be used in the file name, and also to distinguish the different print() lines.