CARRIER-project / verticox

Apache License 2.0
2 stars 0 forks source link

Build and Publish DOI

verticox+

This repository contains the components for running vertical cox proportional hazards analysis in a setting where the data is vertically partitioned.

The solution is based on the Verticox algorithm from Dai et al., 2022. It has been adapted to be used within the Vantage6 framework.

This solution will be extended with the scalar vector product protocol to solve certain privacy concerns in "vanilla" Verticox.

Current status

The current vantage6 algorithm only implements the original Verticox algorithm. The addition of the scalar vector product protocol will be released in the near future.

N-party-scalar-product-protocol

We are going to enhance the verticox algorithm by applying the n-party scalar product protocol to the components of the verticox algorithm that require querying which samples have a matching event time.

These are the components:

$\sum \limits{n \in E} \mathbf{x}{nk}$ (for datanodes) Where $E$ is the collection of samples that are NOT right-censored.

$\sum \limits_{j \in R_t} exp(K \overline{z}_j)$ (at the central server)

Encryption

Communication over the vpn network needs an extra layer of encryption so that the vpn server cannot inspect the traffic.

If we are going to use TLS we will need to generate certificates and place them at the nodes. Questions:

How to use

Prerequisites

You will need to have the vantage6 infrastructure setup to be able to use this algorithm. Check their website for installation instructions.

Local installation

You can install the dependencies with pip: pip install vantage6-client git+https://github.com/CARRIER-project/verticox.git#subdirectory=python

Running the algorithm

You will probably want to check which nodes contain which features before you run the algorithm.

from verticox.client import VerticoxClient
from vantage6.client import Client

# Create a vantage6 client
client = Client(v6_host, v6_port)
client.authenticate(username, password)
client.setup_encryption(private_key)

# Instantiate a verticox client
verticox_client = VerticoxClient(client)

# Get insight into the columns at the datanodes
result = verticox_client.get_column_names()
print(result.get_results())
>> [Result(organization_id=1,
           content=['gender', 'hr', 'los', 'miord', 'mitype', 'sho', 'sysbp', 'outcome_time',
                    'outcome']),
   >> Result(organization_id=2, content=['outcome_time', 'outcome']),
>> Result(organization_id=3,
          content=['afb', 'age', 'av3', 'bmi', 'chf', 'cvd', 'diasbp', 'outcome_time', 'outcome'])]

Now you know which data resides at what node you can run the verticox algorithm:

feature_columns = ['afb', 'age', 'gender']

task = verticox_client.fit(feature_columns, 'outcome_time', 'outcome', [3, 5], 2)

# This will take a while
task.get_results()

Python components

Follow the README in the python/ directory.