Oufattole / meds-torch

MIT License
16 stars 2 forks source link

Complete synthetic testing data #54

Open Oufattole opened 1 month ago

Oufattole commented 1 month ago

We have raw meds format data in parquet files in some directory that look like this. The goal is to run the meds-transform processing pipeline to convert this to a JNRT with dynamic data and parquet files with static data.

This bash script creates the synthetic test data, and it has two steps

"stages": [
            "aggregate_code_metadata",
            "filter_subjects",
            "add_time_derived_measurements",
            "filter_measurements",
            "occlude_outliers",
            "fit_vocabulary_indices",
            **"custom_normalization",**
            **"custom_text_tokenization",**
            **"custom_text_tensorization",**
        ],

The above stages are preprocessing steps applied to the MEDS table.

The bolded stages with the prefix custom_ need to be implemented.