bcbi / PreprocessMD.jl

Medically-informed data preprocessing for machine learning
MIT License
6 stars 3 forks source link

Outsource dimension reduction through medical code clustering #6

Open AshlinHarris opened 2 years ago

AshlinHarris commented 2 years ago

Integrate external packages for low-level mapping.

AshlinHarris commented 2 years ago

Approached for dimension reduction from Wu et al[^Wu]: [^Wu]: Wu, Hulin, Jose Miguel Yamal, Ashraf Yaseen, and Vahed Maroufy, eds. Statistics and Machine Learning Methods for EHR Data: From Data Extraction to Data Analytics. CRC Press, 2020.

  1. Variable grouping or clustering
    • Which hierarchical medical code to use
  2. Principal Component Analysis
    • Explain most of the variance with a small portion of the data
  3. Embedding and deep learning
    • Embedding turns binary and categorical variables into continuous feature vectors
  4. Missing data imputation
    • Existing methods not suitable for big EHR data analysis
    • Expectation-maximization (EM) method
    • Bayesian approach via MCMC techniques
    • Alternatively, synthetically generate a complete data set
AshlinHarris commented 1 year ago

This feature is a priority for researchers.