bvanaken / clinical-outcome-prediction

Code for the EACL 2021 Paper: Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration
Apache License 2.0
88 stars 28 forks source link

Reproducing LOS result #17

Open JuneHou opened 2 months ago

JuneHou commented 2 months ago

Hi,

I am trying to reproduce the results for the LOS task with the MIMIC-III v1.4. However, I can only achieve around 45% accuracy with this dataset. I have tried to reproduce it using your FARM training code and the Hugging Face Trainer. The number of instances in the generated data doesn't seem to match any of the sizes mentioned in the issue.

"https://github.com/bvanaken/clinical-outcome-prediction/issues/11".

My train / val / test dataset sizes are 30421 / 4391 / 8797

I have already set numpy==1.21.0 pandas==1.3.2 nltk==3.6.2

What is the correct size of data for the LOS task?

Best, Jun

JuneHou commented 2 months ago

Additionally, was any pre-training performed on the models listed in the paper as baselines? image