bhagya-hettige / MedGraph

MedGraph: Structural and Temporal Representation Learning of Electronic Medical Records
20 stars 5 forks source link

MedGraph: Structural and Temporal Representation Learning of Electronic Medical Records

Running MedGraph

MedGraph can be used either as an end-to-end predictive healthcare model to output future medical risks or an unsupervised EMR embedding model to obtain visit and code embeddings.

System requirements

Input data format for MedGraph

MedGraph expects a numpy compressed file (.npz) with the following elements in data directory:

Have a look at the utils.py file for more details.

Running MedGraph script

python train.py dataset --embedding_dim=128 --vc_batch_size=128 --vv_batch_size=32 --K=10 --num_epochs=10 --learning_rate=0.001 --is_gauss=True --distance=w2 --is_time_dis=True

If you want to analyse the dataset behaviour using uncertainty modelling:

MedGraph embeddings

Visit and code embeddings for test EMR are saved in emb directory as a dictionary in a numpy file (.npy).

2-D visualisation of the code embeddings learned from MedGraph

Medgraph produces 128-dimensional code embeddings for ICD-10-CM codes. Then, we use t-SNE to project these embeddings into 2 dimensions, for visualisation. Colour of a code indicates its associated CCS class. You can find the 2-D plot here. When you hover on the plot, you can see the ICD code, its definition and the relevant CCS class.