Closed AHMAD-DOMA closed 1 month ago
Hello @AHMAD-DOMA, I have also successfully reproduced the results presented in the paper for the MIMIC-3 Full
dataset, although I haven't yet done so for the top 50 codes
. I utilized the same parameters as in the paper and determined the optimal threshold to be 0.5
. The resulting metrics are as follows:
Best Threshold: 0.5
Performance Metrics:
Macro Accuracy: 0.0589
Macro Precision: 0.0984
Macro Recall: 0.0727
Macro F1 Score: 0.0836
Micro Accuracy: 0.4059
Micro Precision: 0.7148 <---
Micro Recall: 0.4844 <---
Micro F1 Score: 0.5775 <---
Precision at 8: 0.7644
Recall at 8: 0.4026
F1 Score at 8: 0.5274
Macro AUC: 0.9237
Micro AUC: 0.9892
Thank you, @FareedKhan-dev. If my understanding is correct, you followed the preprocessing steps as described in the README and conducted training for 20 epochs. If that's correct, could you please share the configurations of your experiment?
I apologize for any misunderstanding. To clarify, I didn't perform any training but did perform preprocessing. My results are generated based on a pre-trained model they have provided in README
Hi @AHMAD-DOMA,
Thank you for your interest in our work! I'd say that the discrepancy in the MIMIC-full configuration is so significant that I suspect there's something wrong with the training process. The following factors might be relevant:
I'd suggest using the pretrained checkpoints directly as it's the easiest way to replicate the results.
Best, Chao-Wei
I have encountered a performance discrepancy between two different datasets, MIMIC-50 and MIMIC-Full, while using automatic mixed precision in my model. I followed the same configuration settings and training parameters for both datasets, aiming to reproduce the results from a research paper. While the results for MIMIC-50 are relatively close to the expected outcomes, the results for MIMIC-Full exhibit a notable discrepancy.
Details:
MIMIC-50 Configuration:
--max_length
: 3072--chunk_size
: 128--model_name_or_path
: RoBERTa-base-PM-M3-Voc/RoBERTa-base-PM-M3-Voc-hf--per_device_train_batch_size
: 1--gradient_accumulation_steps
: 8--per_device_eval_batch_size
: 1--num_train_epochs
: 20--num_warmup_steps
: 2000--model_type
: roberta--model_mode
: laatResults for MIMIC-50 with Automatic Mixed Precision:
Paper F1 Result: 71.00
MIMIC-Full Configuration:
--max_length
: 3072--chunk_size
: 128--model_name_or_path
: RoBERTa-base-PM-M3-Voc/RoBERTa-base-PM-M3-Voc-hf--per_device_train_batch_size
: 1--gradient_accumulation_steps
: 8--per_device_eval_batch_size
: 1--num_train_epochs
: 20--num_warmup_steps
: 2000--model_type
: roberta--model_mode
: laatResults for MIMIC-Full with Automatic Mixed Precision:
Paper F1 Result: 59.8
I kindly request assistance in diagnosing and resolving the performance issue encountered with the MIMIC-Full dataset. The goal is to align the results with the paper's reported metrics as closely as possible.
Thank you for your attention and support in addressing this matter.