YerevaNN / mimic3-benchmarks

Python suite to construct benchmark machine learning datasets from the MIMIC-III 💊 clinical database.
https://arxiv.org/abs/1703.07771
MIT License
806 stars 329 forks source link

How to trace back array entries to the actual features used? #123

Closed kohkev closed 2 years ago

kohkev commented 2 years ago

Hi,

first of all: great work! Your project is invaluable to us for the benchmarking of multiple algorithms via useful, real-world medical scenarios.

We currently try to produce single feature shape functions that help us determine which features have the highest effect on model output, e.g. systolic blood pressure on mortality risk.

However, we are not getting displayed the respectively named feature columns in the resulting NumPy array. As this array is created via multiple reading and feature extraction steps, it's very hard for us to trace back where the feature columns are exactly dropped in favour of the array, which is then further preprocessed via imputing etc.

Therefore, we would like to know if there is any script or guidance available in the repo which allows us to trace back the array entries to the original feature names, in order to identify those features of the total 714 which have the highest effect on model output (e.g. mean of systolic blood pressure at 25% of time).

Thanks and Best Regards!