This repository contains the experimental code for the Medic-BERT transformer model for predicting the length of hospitalization stays (LOS) for patients based on sequences of medical events.
Given the patients' EHR data as medical events, the program is able to train predictive models for LOS prediction.
The software is written in python 3.10.
Dependencies can be installed through the requirements.txt
file.
Two csv structured EHR data files should be placed in the Data
folder.
Currently, CSV and PARQUET formats are supported.
patients.csv
specifies the patient cohort and should contain the following fields:
sequence_id | patient_id | age | sex | hosp_start | los |
---|
sequence_id (int)
- The id of the sequencepatient_id (int)
- The id of the patientage (int)
- The age of the patient in yearssex (0/1)
- Binarized patient sexhosp_start (datetime)
- Start date and time of the hospitalizationlos (float)
- Length of stay in daysdata.csv
specifies the EHR of a patient modelled as individual tokenized medical events
token | token_orig | event_time | event_type | event_value | sequence_id |
---|
token (str)
- The tokenized version of a medical event (e.g., Temp_Low)token_orig (str)
- The original token before tokenization (e.g., Temp) event_time (datetime)
- Date and time of the eventevent_type (str)
- The type of the event (e.g., Laboratory)event_value (float)
- Original float/categorical value of eventsequence_id (int)
- The id of the sequenceExample data files can be located in the Data
folder.
The program can process input files as the format of CSV
and parquet
files.
main.py
is the main entry for training and evaluating models. The behavior of the
software depends on the configuration of the config.ini
file.
The parameters and their settings are described below. The most important settings are described first, subsequently the ones rarely changed:
If you use M-BERT in a scientific publication, we would like a citation:
@inproceedings{hansen2023patient,
title={Patient Event Sequences for Predicting Hospitalization Length of Stay},
author={Hansen, Emil Riis and Nielsen, Thomas Dyhre and Mulvad, Thomas and Strausholm, Mads Nibe and Sagi, Tomer and Hose, Katja},
booktitle={International Conference on Artificial Intelligence in Medicine},
pages={51--56},
year={2023},
organization={Springer}
}