Closed melhemr closed 2 years ago
@ashishsingh18 and @AbdulkadirA I have added a synthetic dataset called test_synthetic_data.pkl.gz to the repo. This dataset is larger than the initially proposed ~20 row dataset because if we are to test the out-of-sample harmonization function, then we will need at least 25 control rows for each site. I included four sites in the dataset, three of which have at least 25 controls, and one of which has less than 25, so that we can also test to make sure we are not calculating harmonization parameters for sites without the required number of controls. In total, the dataset has 120 rows. Please let me know what you think.
PR adds a synthetic harmonization model/dataset to the project for testing purposes