PaddlePaddle / PaddleFL

Federated Deep Learning in PaddlePaddle
Apache License 2.0
496 stars 120 forks source link

Mixed Protocol Framework #128

Open heads7 opened 4 years ago

heads7 commented 4 years ago

Hello again!

How we divided data in mpc examples with aby3 ? Is it just parallel data dividying ?

For example, I have dataset 90x10 and when I am using this method I will get three encrypted parts 30x10 ? Or how?

honshj commented 4 years ago

You can divide the original data arbitrarily, but need to encrypt it in ABY3 way. For example, you can divide your 90(features) x 10(rows) dataset into two parts, each with 45 x 10. So far the division is not about encryption, it's just simulate the case that two parts of dataset are from two different owners.

Next, to do MPC training, you will need to encrypt each part of the data, so that they can be computed using ABY3. In the ABY3 way, each part of data will become three copies(shares) after encryption, and each copy will be sent to one computing node (ABY3 asks for three computing nodes) for computation. You can refer to the introduction or ABY3 paper to understand the details.

heads7 commented 4 years ago

You can divide the original data arbitrarily, but need to encrypt it in ABY3 way. For example, you can divide your 90(features) x 10(rows) dataset into two parts, each with 45 x 10. So far the division is not about encryption, it's just simulate the case that two parts of dataset are from two different owners.

Next, to do MPC training, you will need to encrypt each part of the data, so that they can be computed using ABY3. In the ABY3 way, each part of data will become three copies(shares) after encryption, and each copy will be sent to one computing node (ABY3 asks for three computing nodes) for computation. You can refer to the introduction or ABY3 paper to understand the details.

So, when we do save_aby3_shares we will have three copies of original file ?

In the guide at the main page of Paddle FL was written that NN with MPC assumes Vertical data partition .How is it presented?

honshj commented 4 years ago

So, when we do save_aby3_shares we will have three copies of original file ?

Yes. Usually, we first call 'make_shares' to generate shares from raw file, and input the result to 'save_aby3_shares' to get three copies of the encrypted data of the raw file.

In the guide at the main page of Paddle FL was written that NN with MPC assumes Vertical data partition .How is it presented?

Not exactly, MPC does not assume the way of data partition. It support both horizontal and vertical data partition. Regarding to the vertical partition case, it's exactly the case in my earlier reply.

Tks!

heads7 commented 4 years ago

So, when we do save_aby3_shares we will have three copies of original file ?

Yes. Usually, we first call 'make_shares' to generate shares from raw file, and input the result to 'save_aby3_shares' to get three copies of the encrypted data of the raw file.

In the guide at the main page of Paddle FL was written that NN with MPC assumes Vertical data partition .How is it presented?

Not exactly, MPC does not assume the way of data partition. It support both horizontal and vertical data partition. Regarding to the vertical partition case, it's exactly the case in my earlier reply.

Tks!

Okay. So, I have some questions:

1) We can see only vertical data partirion in examples? How I understood all data in examples were vertical divided, but for simplicity this is omitted. So we can see tha stage where data is getting into shares and sends to all computations parts. Where I can see how work vertical partition in Paddle FL ? PSI ? 2) In PSI I see that data is divided vertical like this: we have 14 columns: 13 features, 1 label, so it divided like typically data. What if I want divide data like 7 features in one part, 6 features in second part and the third part is label. Can I do it and use in training ?

3) MPC protocol assumes that we have only 3 machines not more?

heads7 commented 3 years ago

You can divide the original data arbitrarily, but need to encrypt it in ABY3 way. For example, you can divide your 90(features) x 10(rows) dataset into two parts, each with 45 x 10. So far the division is not about encryption, it's just simulate the case that two parts of dataset are from two different owners.

Also wanna ask next: when I divide data horizontally like you said 45x10, how I can use it in MPC training? I will have 2 files, okay. But I do ABY3 shares and will have shares of first file and second file and I muse sen shares to training nodes, now nodes will have something like:

file1_features.part1
file2_features.part1
file1_labels.part1
file2_labels.part1

How I can use it both? Do I need to divide them in some step?

@honshj wait you answers :)

heads7 commented 3 years ago

Any answers ?