Oufattole / meds-torch

MIT License
16 stars 2 forks source link

Zero-shot evaluation #105

Open Oufattole opened 1 month ago

Oufattole commented 1 month ago

We will use tensorized data format over MEDS-like format for zero-shot evaluation. ESGPT's labeler implementation provides GPU-accelerated label generation. While the alternative MEDS-like format would offer better interpretability for time-based tasks through Polars dataframes, it requires CPU processing and data transfers that would limit our sampling scalability.

To enable this we should use the ESGPT Labeler abstraction and there is an example here to allow user defined tasks.

The zero-shot script should require for input args

  1. A task following the meds-label schema with the ground truth binary classification labels (already implemented in the pytorch_dataset class)
  2. A user defined labeler is provided
  3. A pretrained LM checkpoint (already implemented in src/meds_torch/finetune.py)

The script should: