Oufattole / meds-torch

MIT License
11 stars 1 forks source link

MEDS-torch

PyTorch Lightning Config: Hydra Template
Python PyPI Hydra Tests Code Quality Contributors Pull Requests License Documentation Status

Description

This repository provides a flexible suite for advanced machine learning over Electronic Health Records (EHR) using PyTorch, PyTorch Lightning, and Hydra for configuration management. The project ingests tensorized data from the MEDS_transforms repository, a robust system for transforming EHR data into ML ready sequence data. By employing a variety of tokenization strategies and sequence model architectures, this framework facilitates the development and testing of models that can perform.

Key features include:

The goal of this project is to push the boundaries of what's possible in healthcare machine learning by providing a flexible, robust, and scalable sequence model tools that accommodate a wide range of research and operational needs. Whether you're conducting academic research or developing clinical applications with MEDS format EHR data, this repository offers tools and flexibility to develop deep sequence models.

Installation

Pip

PyPi

pip install meds-torch

git

# clone project
git clone git@github.com:Oufattole/meds-torch.git
cd meds-torch

# [OPTIONAL] create conda environment
conda create -n meds-torch python=3.12
conda activate meds-torch

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -e .

How to run

Train model with default configuration

# train on CPU
python -m meds_torch.train trainer=cpu

# train on GPU
python -m meds_torch.train trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python -m meds_torch.train experiment=experiment_name.yaml

You can override any parameter from command line like this

python -m meds_torch.train trainer.max_epochs=20 data.batch_size=64

📌  Introduction

Why you might want to use it:

✅ Support different tokenization methods for EHR data

✅ MEDS data Supervised Learning and Transfer Learning Support

✅ Ease of Use and Reusability
Collection of useful EHR sequence modeling tools, configs, and code snippets. You can use this repo as a reference for developing your own models. Additionally you can easily add new models, datasets, tasks, experiments, and train on different accelerators, like multi-GPU.

Loggers

By default wandb logger is installed with the repo. Please install a different logger below if you wish to use it:

pip install neptune-client
pip install mlflow
pip install comet-ml
pip install aim>=3.16.2  # no lower than 3.16.2, see https://github.com/aimhubio/aim/issues/2550

Development Help

To run tests on 8 parallel workers run:

pytest -n 8