doubts on reported number of Train/Val/Test Samples

From the Figure 1 in the publication[1], the model was trained on a train/val/test split of 4396/ 472/1257.

However, the MedFuse preprocessing pipeline[2], which has been reused for this paper with no major modification, reports 4885/540/1373. Crucially, this is because the authors made a small mistake that is still existent in MeTra.

Here is the error in Metra https://github.com/FirasGit/MeTra/blob/3947e611e86fa7147d0d34d080a3dfc3c6c5bb22/classification/datasets/mimic_lab.py#L163

    if cfg.dataset.task == 'in-hospital mortality': # should be 'in-hospital-mortality'
           end_time = cxr_merged_icustays.intime + pd.DateOffset(hours=48)

which will include cxr samples > 48h after admission. After fixing this error, the MedFuse authors report a train/val/test split of 4485/488/1242 which is somewhat more inline with the reported split in MeTra. I could not find any preprocessing steps that would otherwise explain this gap.

Thus the question: What split correspond to the performance reports[3] in the publication? I ask this because the code does not match the reported train/val/test split

[1] https://www.nature.com/articles/s41598-023-37835-1/figures/1 [2] https://github.com/nyuad-cai/MedFuse/tree/6f827589afd89562813cc5aa915762d054c29efc [3] https://www.nature.com/articles/s41598-023-37835-1/figures/2

FirasGit / MeTra

doubts on reported number of Train/Val/Test Samples #5