EmmaRocheteau / eICU-GNN-LSTM

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).
MIT License
99 stars 29 forks source link

How to get 'padded.dat' in bert.py when constructing graph #1

Closed LeslieHoloway closed 3 years ago

LeslieHoloway commented 3 years ago

When using the following scipt to make the graphs, a FileNotFoundError encountered. python3 -m graph_construction.bert

In graph_construction/bert.py, padded.dat and attention_mask.dat are needed.

def read_data(graph_dir):
    padded = np.memmap(graph_dir + 'padded.dat', dtype=int, shape=(89123, 512))
    attention_mask = np.memmap(graph_dir + 'attention_mask.dat', dtype=int, shape=(89123, 512))
    input_ids = torch.tensor(padded).to('cuda')
    attn_mask = torch.tensor(attention_mask).to('cuda')
    return input_ids, attn_mask

How to generate padded.dat and attention_mask.dat? I am curious about how to use bert to analyze the EHR data. Could you share the related code? Thank you very much.

EmmaRocheteau commented 3 years ago

Thank you for bringing this to our attention! We were missing some code in bert.py, but we have now added it. If you would like access to the graphs without needing those files, you can just download it by request from here: https://drive.google.com/drive/folders/1yWNLhGOTPhu6mxJRjKCgKRJCJjuToBS4?usp=sharing

This folder contains padded.dat, attention_mask.dat and bert_out.dat, and all the graph files.

To make those files from scratch, you would need to follow the instructions in the README of the graph_contruction folder. These are also on the main README.