Poor AUC and accuracy on our own neuroimaging/cognitive dataset

alessiasarica commented 3 weeks ago

Hi Florence, thanks for your cool library.

We are testing it on our dataset, specifically on two tabulars, neuroimaging and cognitive, with a binary classification task. We started from your notebook plot_model_comparison_loop_kfold and we simply modified the path to the data and of course the prediction task from multi class to binary.

However, we obtained really poor performance (auc 0.50, acc between 0.30-0.40) and the following warning, which I guess it's the main issue: UserWarning: No positive samples in targets, true positive value should be meaningless. Returning zero tensor in true positive score

We have no idea whether the problem is related to our dataset or to models parameters (e.g. batch_size). I also thought that during the train-test split, a fold could have only positive/negative samples.

May you help us?

Best, Alessia

florencejt commented 2 weeks ago

Hi Alessia, thanks for your message! I'll try to help as best I can.

The problem you're having could definitely be due to a couple reasons:

Difficult task - the prediction task might be inherently very difficult. I would check whether simpler machine learning methods (random forest, SVM) result in okay accuracy.
Sample size - because the fusilli methods are neural-network-based, they're a bit more "data hungry" than the classical ML methods I mentioned previously. My PhD is also in neuroimaging so I'm constantly battling low sample sizes. If you're sample size is pretty low compared to how many features you have, then you might want to modify the neural networks to be smaller to reduce the risk of overfitting. This is a tutorial for how to do that: link. I would start by reducing the number of layers and reducing the width of them too.
The UserWarning - a classic! I've had this lots before too. This means (to the best of my knowledge) that all the data within a batch are in the 0 class. One of my recommendations for this would be to look into increasing the batch size (the argument batch_size in prepare_fusion_data. Another way of investigating would to look at the class proportions in your dataset. If you've got quite an unbalanced dataset (way more 0s than 1s), then it might be an idea to try to under/oversample your data before putting it into prediction models. Here's quite a good tutorial for that: link.
K fold split - It's definitely possible that one of the folds might have disproportionate classes compared to the others. What you could do is to predefine the folds instead of letting fusilli do it randomly. You can use the argument own_kfold_indices in prepare_fusion_data to pass in a list of num_folds lists (e.g. a list of 5 lists for 5-fold CV), with the indices you want in each fold. This way you could design the folds yourself to make sure there's a good spread of classes. You could get these indices from sklearn stratified kfold split
Overfitting - This ties in to sample size, but it's likely that if the sample size is small, then neural networks might overfit to the training data. A way to check whether this is happening is to look at the loss curves and see if the validation loss is a lot higher than the training loss, or even increasing as the training continues. If that is happening, you might want to stop the training before the overfitting happens, and you can do this by changing the early stopping criteria. You can do this by changing own_early_stopping_callback in prepare_fusion_data. You have to initialise your own Pytorch early stopping callback object, and then you can choose parameters that control when the training stops.

This is quite the barrage of information, but I hope I've written something helpful in here somewhere.

If you're still having problems, or need some guidance on how to modify models/early stopping etc, please don't hesitate to message here or email me: florence.townend.21@ucl.ac.uk

Thanks! - Florence 🌸

alessiasarica commented 2 weeks ago

Thanks Florence for your useful suggestions.

We tried to modify directly the py files in the fusilli packages, for example the correlation threshold of GNN, or the learning rate, and we really improved our performance (higher than 90%).

For sure, we'll try the stratified kfold, maybe it improves further.

I will be happy to update you, since you also work on neuroimaging data!

Best, Alessia

florencejt commented 2 weeks ago

Oh interesting, well that's good news! Did you fork the repository? I'd be interested to see the changes you made that affected your performance so drastically.

And yes please update me, I'd be excited to know how fusilli works in other neuroimaging settings ☺️

I'll close this issue for now since you seem to have worked it out, but please tell me what changes you made (over email or on the discussions tab here) - it might be helpful for my own fusilli application!

florencejt / fusilli

Poor AUC and accuracy on our own neuroimaging/cognitive dataset #29