Replicating results - Mortality Prediction

NtaylorOX commented 2 years ago

Hello - this is not a technical/code issue, rather a request for information.

Amazing codebase by the way!

I have successfuly rebuilt the datasets for each of the clinical-outcomes - although my particular interest right now is the mortality prediction. As a baseline I am trying to replicate the results here - although using my own code for generating the BERT model followed by classifier, I have attempted a mixture of downsamping the dataset to have a 50-50 split of classes, or use all of the imbalanced class data and calculating class_weights for cross entropy, a implementation I belive your code uses (from the farm repo: https://farm.deepset.ai/_modules/farm/data_handler/data_silo.html#DataSilo.calculate_class_weights).

I am still finding the BERT based models are tending to predict the majority class - and my evaluation metrics are far from those reported in your paper (~0.7 vs 0.84).

Did your expierments use the class_weights with the full training dataset?

Any information would be much appreciated.

bvanaken commented 2 years ago

Hello and thanks for your interest in our work!

We have indeed experimented with calculating the class weights to balance the loss (with the balance_classes parameter in our repo which uses the FARM method that you linked). We used this as one parameter in our hyperparameter search and found that it does not always improve results, but sometimes.

For the BERT-based models (without CORe-pretraining) we found that the best set of parameters was actually not using the balancing. The parameters for our reported results are:

'balance_classes': False, 'grad_acc': 10, 'dropout': 0.2, 'lr': 1e-05, 'warmup_steps': 50

Maybe you can try to run your setup with these parameters and see if it produces better results. We have not experimented with different splits (e.g. 50-50) as you did, but it would be interesting if this could further improve results in general.

Hope this helps, otherwise don't hesitate to come back for further questions!

Betty

NtaylorOX commented 2 years ago

Hi Betty,

Thanks for the super reply. Very helpful.

I will try to re run some of my training with the same parameters as you have described. For reasons of customisation I am running experiments outside of the FARM environment, which may explain some of the differences in findings so far. I am hoping it may simply be tweaking the batch size/gradient accumulation steps, as from what I can tell that's the only major difference in the training I've done so far. My results for the mortality prediction with ClinicalBioBert was around 0.76 F1 macro, so quite far off your results.

If okay, I will keep this ticket open until I've run through with the same parameters as described above and I'll report my results. Then we can close the issue.

Thanks again!

Niall

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: Betty van Aken @.> Sent: Thursday, March 10, 2022 2:12:38 PM To: bvanaken/clinical-outcome-prediction @.> Cc: NtaylorOX @.>; Author @.> Subject: Re: [bvanaken/clinical-outcome-prediction] Replicating results - Mortality Prediction (Issue #7)

Hello and thanks for your interest in our work!

We have indeed experimented with calculating the class weights to balance the loss (with the balance_classes parameter in our repo which uses the FARM method that you linked). We used this as one parameter in our hyperparameter search and found that it does not always improve results, but sometimes.

For the BERT-based models (without CORe-pretraining) we found that the best set of parameters was actually not using the balancing. The parameters for our reported results are:

'balance_classes': False, 'grad_acc': 10, 'dropout': 0.2, 'lr': 1e-05, 'warmup_steps': 50

Maybe you can try to run your setup with these parameters and see if it produces better results. We have not experimented with different splits (e.g. 50-50) as you did, but it would be interesting if this could further improve results in general.

Hope this helps, otherwise don't hesitate to come back for further questions!

Betty

— Reply to this email directly, view it on GitHubhttps://github.com/bvanaken/clinical-outcome-prediction/issues/7#issuecomment-1064043197, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALWDIUYVETXAQO4CUUXY5GLU7HYMNANCNFSM5PKL6XQA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

NtaylorOX commented 2 years ago

Hi again,

Sorry to email again but do you remember what batch_size you were using? Or did you let FARM decide that for you?

Best,

Niall

From: Betty van Aken @.> Sent: 10 March 2022 13:12 To: bvanaken/clinical-outcome-prediction @.> Cc: NtaylorOX @.>; Author @.> Subject: Re: [bvanaken/clinical-outcome-prediction] Replicating results - Mortality Prediction (Issue #7)

Hello and thanks for your interest in our work!

We have indeed experimented with calculating the class weights to balance the loss (with the balance_classes parameter in our repo which uses the FARM method that you linked). We used this as one parameter in our hyperparameter search and found that it does not always improve results, but sometimes.

For the BERT-based models (without CORe-pretraining) we found that the best set of parameters was actually not using the balancing. The parameters for our reported results are:

'balance_classes': False, 'grad_acc': 10, 'dropout': 0.2, 'lr': 1e-05, 'warmup_steps': 50

Maybe you can try to run your setup with these parameters and see if it produces better results. We have not experimented with different splits (e.g. 50-50) as you did, but it would be interesting if this could further improve results in general.

Hope this helps, otherwise don't hesitate to come back for further questions!

Betty

— Reply to this email directly, view it on GitHubhttps://github.com/bvanaken/clinical-outcome-prediction/issues/7#issuecomment-1064043197, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALWDIUYVETXAQO4CUUXY5GLU7HYMNANCNFSM5PKL6XQA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

bvanaken commented 2 years ago

Hi Niall,

right, I left that out because it was not part of the tuning. But surely important: we used a batch size of 20 for all the experiments.

Best Betty

NtaylorOX commented 2 years ago

Perfect - thanks! So with the gradient accumulation, an effective batch_size of 200 I guess? As the optimizers only update parameters every 10 batches?

Best,

Niall

From: Betty van Aken @.> Sent: 15 March 2022 09:02 To: bvanaken/clinical-outcome-prediction @.> Cc: NtaylorOX @.>; Author @.> Subject: Re: [bvanaken/clinical-outcome-prediction] Replicating results - Mortality Prediction (Issue #7)

Hi Niall,

right, I left that out because it was not part of the tuning. But surely important: we used a batch size of 20 for all the experiments.

Best Betty

Am Di., 15. März 2022 um 09:55 Uhr schrieb NtaylorOX < @.***>:

Hi again,

Sorry to email again but do you remember what batch_size you were using? Or did you let FARM decide that for you?

Best,

Niall

From: Betty van Aken @.> Sent: 10 March 2022 13:12 To: bvanaken/clinical-outcome-prediction @.> Cc: NtaylorOX @.>; Author @.> Subject: Re: [bvanaken/clinical-outcome-prediction] Replicating results - Mortality Prediction (Issue #7)

Hello and thanks for your interest in our work!

We have indeed experimented with calculating the class weights to balance the loss (with the balance_classes parameter in our repo which uses the FARM method that you linked). We used this as one parameter in our hyperparameter search and found that it does not always improve results, but sometimes.

For the BERT-based models (without CORe-pretraining) we found that the best set of parameters was actually not using the balancing. The parameters for our reported results are:

'balance_classes': False, 'grad_acc': 10, 'dropout': 0.2, 'lr': 1e-05, 'warmup_steps': 50

Maybe you can try to run your setup with these parameters and see if it produces better results. We have not experimented with different splits (e.g. 50-50) as you did, but it would be interesting if this could further improve results in general.

Hope this helps, otherwise don't hesitate to come back for further questions!

Betty

— Reply to this email directly, view it on GitHub< https://github.com/bvanaken/clinical-outcome-prediction/issues/7#issuecomment-1064043197>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ALWDIUYVETXAQO4CUUXY5GLU7HYMNANCNFSM5PKL6XQA

. Triage notifications on the go with GitHub Mobile for iOS< https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android< https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/bvanaken/clinical-outcome-prediction/issues/7#issuecomment-1067720176, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQGIYZWWP3E3W6M5BUPM3LVABF6NANCNFSM5PKL6XQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/bvanaken/clinical-outcome-prediction/issues/7#issuecomment-1067726732, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALWDIU2IVF7VN4JNAPNLPILVABGZ5ANCNFSM5PKL6XQA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

bvanaken commented 2 years ago

Yes, exactly!

NtaylorOX commented 2 years ago

Thank you for all the replies - I will let you know if my results improve.

If they do not - it may be FARM is doing more under the hood than I realise 🙂

From: Betty van Aken @.> Sent: 15 March 2022 09:17 To: bvanaken/clinical-outcome-prediction @.> Cc: NtaylorOX @.>; Author @.> Subject: Re: [bvanaken/clinical-outcome-prediction] Replicating results - Mortality Prediction (Issue #7)

Yes, exactly!

— Reply to this email directly, view it on GitHubhttps://github.com/bvanaken/clinical-outcome-prediction/issues/7#issuecomment-1067740746, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALWDIU2OPFZZROA7G3FO26TVABISLANCNFSM5PKL6XQA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

bvanaken / clinical-outcome-prediction

Replicating results - Mortality Prediction #7