COREBEHRT
For running the tests and pipelines, it is adviced to create a virtual environment, enable it, and install the requirements.
$ python -m venv .venv
$ source .venv/bin/activate
(.venv) $ pip install -r requirements.txt
Enable your virtual environment and run the unittests:
(.venv) $ python -m unittest
The pipeline can be run from the root directory by executing the following commands:
(.venv) $ python -m corebehrt.main.create_data
(.venv) $ python -m corebehrt.main.pretrain
(.venv) $ python -m corebehrt.main.create_outcomes
(.venv) $ python -m corebehrt.main.finetune_cv
Creates tokenized features from the formatted data.
Pretrains the model on the tokenized features.
Creates the outcomes from the formatted data. Outcomes are stored as absolute positions.
Finetunes the pretrained model on the outcomes.
FeatureCreator
From the raw data PID | CONCEPT | ADMISSION_ID | TIMESTAMP | ... |
---|
and patient data
PID | GENDER | BIRTHDATE | DEATHDATE | ... |
---|
we create:
PID | concept | abspos | segment | age | ... |
---|
and include the following:
Excluder
Results are saved in a table.
EHRTokenizer
Currently, still operates on sequences. Adds SEP and CLS tokens Create vocabulary based on pretrain_data
Use the submodule corebehrt.azure
for running on Azure with SDK v2. See how-to.