inconsistencies with mimic-iii 50 dataset

gmichalo commented 3 years ago

Hello,

Thank you for providing an open-source code for the project.

While I was trying to run the code, I found that there are inconsistencies in mimic-iii 50 labels dataset that is produced using your code and the dataset that is created by using the CAML code (https://github.com/jamesmullenbach/caml-mimic)

Firstly I want to mention that both datasets have included the correct ids for patients but I found that the CAML code in the top-50 codes has included the code '37.23' but you have included '93.90'. This has the effect of creating different labels in all three parts of the datasets (train/dev/test)

For example for the '93.90' code for an instance in the dataset the caml paper had the labels: {'412', '401.9', '36.15', '414.01', '496', '39.61'} but your code had: {'412', '401.9', '36.15', '414.01', '496', '93.90', '39.61'} Or for the '37.23' code the caml code had the labels: {'427.31', '37.23', '428.0', '272.0', '88.56', '414.01'} but your code had the labels: {'427.31', '428.0', '272.0', '88.56', '414.01'}

Could you let me know why there are inconsistencies with the dataset and which version did you used for yours expirements?

tienthanhdhcn commented 3 years ago

Hi @gmichalo,

Thank you for the question.

We use the MIMIC III 2016 version 1.4 downloaded in June 2019. From the data, we created the top 50 based on the code frequency and it seems that 93.90 is associated with a higher frequency than 37.23 (https://github.com/aehrc/LAAT/blob/cd5c0ec0b0b8098289042be6d68363a760d7bbca/src/util/mimiciii_data_processing.py#L76).

gmichalo commented 3 years ago

Hello,

Yes, indeed in order to create the dataset on both codes also use MIMIC III 2016 version 1.4 . However, run both codes two times, and every time I get the same inconsistencies. I am sorry to insist on this subject but if the CAML code and your code creates different versions of the MIMIC dataset, I am not sure if we can have a fair comparison between the models (and all the model in the literature that mentions that they follow the CAML code)

tienthanhdhcn commented 3 years ago

I don't say that the data versions are different. What I said is that we generated the top 50 based on the frequency and we expected that CAML did the same but it seems there is a difference of 1 out of 50 codes.

You can simply use the list of the codes from CAML for the MIMIC-III-50 data and re-run the experiment with that version of the data using our code.

tienthanhdhcn commented 3 years ago

Hi @gmichalo,

I have run an experiment in which I replace 93.90 with 37.23 in the top 50 to make it match with the data from CAML (though still don't fingure it out why 37.23 got selected in CAML while its frequency is lower than that of 93.90). The results are as follows:

	macro-AUC	micro-AUC	macro-F1	micro-F1	P@5	P@8	P@15
LAAT	92.8	94.5	66.8	71.2	67.3	54.6	35.9
JointLAAT	92.7	94.5	67.0	71.3	66.5	54.5	35.8

The performance is on the same bar as what was reported in our LAAT paper (Table 2).

I also attached the run commands and output log files here run_files_log_files.zip.

aehrc / LAAT

inconsistencies with mimic-iii 50 dataset #2