data61 / MP-SPDZ

Versatile framework for multi-party computation
Other
906 stars 279 forks source link

Problem of Inputting data from files with other format #1467

Closed f-hy closed 2 months ago

f-hy commented 2 months ago

Import a piece of data from each of the two parties (using csv as an example, if the two files are a.csv and b.csv respectively, and the input sides are A and B respectively), A cannot see b.csv and B cannot see a.csv, in this case, how to input the data separately? Do I need to divide into multiple mpc files and run mpc files step by step? In the following code, only when both parties have both a.csv and b.csv then code can run successfully, but I want party A to only have a.csv and party B to only have b.csv, and it still run successfully, I mean, I want each party to protect his data, so what can I do to get such result?(Note that I don't want to use Player-Data/Input-P0-0 or use -I as input to get data from each party)

dfh = pd.read_csv('Programs/Public-Input/a.csv', index_col='id')
dfg = pd.read_csv('Programs/Public-Input/b.csv', index_col='id')
X, y = dfh.iloc[:, 1:], dfh.iloc[:, 0]
gX, gy = dfg.iloc[:, 1:], dfg.iloc[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
gX_train, gX_test, gy_train, gy_test = train_test_split(gX, gy, test_size=0.2, random_state=0)
print(gX_train.shape, gX_test.shape, gy_train.shape, gy_test.shape)
X_train = sfix.input_tensor_via(0, X_train)
y_train = sint.input_tensor_via(0, y_train)
X_test = sfix.input_tensor_via(0, X_test)
y_test = sint.input_tensor_via(0, y_test)
gX_train = sfix.input_tensor_via(1, gX_train)
gy_train = sint.input_tensor_via(1, gy_train)
gX_test = sfix.input_tensor_via(1, gX_test)
gy_test = sint.input_tensor_via(1, gy_test)
X_train = X_train.concat(gX_train)
y_train = y_train.concat(gy_train)
X_test = X_test.concat(gX_test)
y_test = y_test.concat(gy_test)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
tree = TreeClassifier(max_depth=4, n_threads=4)
tree.fit(X_train, y_train)
print_ln('%s', (tree.predict(X_test) - y_test.get_vector()).reveal())
mkskeller commented 2 months ago

You can do something similar to Programs/Source/breast_logistic.mpc by branching on command-line arguments:

if 'party0' in program.args:
    a = sfix.input_tensor_via(0, pd.read_csv('a.csv'))
    b = sfix.input_tensor_via(1, shape=a_shape)
elif 'party1' in program.args:
    a = sfix.input_tensor_via(0, shape=b_shape)
    b = sfix.input_tensor_via(1, pd.read_csv('b.csv'))

The you compile using ./compile.py <program> party0 on one side and ./compile.py <program> party1 on the other side before running ./<protocol-party.x> <program>-party0 and ./<protocol-party.x> <program>-party1.

f-hy commented 2 months ago

It helps a lot, thanks!

f-hy commented 2 months ago

I get it again, thanks a lot!