bvanaken / clinical-outcome-prediction

Code for the EACL 2021 Paper: Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration
Apache License 2.0
87 stars 26 forks source link

Reproducing LOS result #17

Open JuneHou opened 2 days ago

JuneHou commented 2 days ago

Hi,

I am trying to reproduce the results for the LOS task with the MIMIC-III v1.4. However, I can only achieve around 45% accuracy with this dataset. I have tried to reproduce it using your FARM training code and the Hugging Face Trainer. The number of instances in the generated data doesn't seem to match any of the sizes mentioned in the issue.

"https://github.com/bvanaken/clinical-outcome-prediction/issues/11".

My train / val / test dataset sizes are 30421 / 4391 / 8797

I have already set numpy==1.21.0 pandas==1.3.2 nltk==3.6.2

What is the correct size of data for the LOS task?

Best, Jun

JuneHou commented 2 days ago

Additionally, was any pre-training performed on the models listed in the paper as baselines? image