CARRIER-project / verticox

Apache License 2.0
2 stars 0 forks source link

Should central server be run on node that contains outcome? #55

Open dsmits opened 1 year ago

dsmits commented 1 year ago

I don't know why I didn't see this before (did we discuss this @fvandaalen?)

The first order derivative is defined by:

$$ \frac{\partial L_{\overline{z}}}{\partial \overline{z}u} = \sum \limits{t=1}^{t_u} \left[ dt \frac{ K exp( Kz{\overline{u}})}{\sum \limits_{j \in R_t} exp(K \overline{z}_j)} \right] + K \rho \left[ \overline{z}_u - \overline{\sigma}_u - \frac{\overline{\gamma}_u^{(p-1)}}{\rho} \right]$$

We had already identified one privacy-infringing component (component 2) that will be taken care of by the n party scalar protocol:

$$ \frac{\partial L_{\overline{z}}}{\partial \overline{z}u} = \sum \limits{t=1}^{t_u} \left[ dt \frac{ K exp( Kz{\overline{u}})}{component2} \right] + K \rho \left[ \overline{z}_u - \overline{\sigma}_u - \frac{\overline{\gamma}_u^{(p-1)}}{\rho} \right] $$

However, there is still the issue that we have to sum from $t=1$ to $t_u$, where $t_u$ is the event time that belongs to the sample at index $u$.

This could in theory be solved by another call to the n-scalar-protocol, but this will affect performance even more than component 2 because the elements of component 2 can be reused.

This is leading me to wonder why we wouldn't run the central server on the node that contains the outcome data. Then we wouldn't have to use the n-scalar protocol on the central server and everything can stay efficient.

I will go ahead and experiment with this approach. It's possible I have forgotten some things that we discussed @fvandaalen . In this approach the n scalar protocol will still be used at the datanodes but not at the central aggregator.

One privacy concern would be that by running this on the outcome node, you would be able to link the data coming from the other nodes to a specific patient. If there are only a few variables used per location, you would be able to interpret this, and e.g. see which patients have high blood pressure. This could be solved though by adding a disclaimer or setting a limit on the minimum number of features at a node.

dsmits commented 1 year ago

If the issue can be solved by running the central server on the outcome node, then verticox+ is already done (except for the encryption part) and we can close this issue