eminSerin / NBS-Predict

A prediction-based extension of network-based statistics.
GNU General Public License v3.0
29 stars 6 forks source link

Inconsistent Learning Performance with Default Tutorial Data and Parameters #37

Closed jurriolayak closed 7 months ago

jurriolayak commented 9 months ago

Issue Description:

We are encountering an issue where the NBS-Predict algorithm is not learning effectively when using the default tutorial data and parameters. The output consistently shows average AUC scores around 0.5, which suggests that the model is not performing better than random chance.

Environment:

Steps to Reproduce our issue:

image

Results: The algorithm consistently returned AUC scores around 0.5. With the default tutorial data and parameters, I expected the algorithm to learn effectively and provide AUC scores significantly different from 0.5, but the following is the MATLAB command window output:

ESTIMATOR: LogReg
Searching Algorithm: bayesOpt
METRIC: auc
Number of Folds: 10
Number of Repetitions: 10
-------------
|   Score   |
-------------
|   0.490   |
|   0.496   |
|   0.475   |
|   0.498   |
|   0.507   |
|   0.490   |
|   0.506   |
|   0.507   |
|   0.498   |
|   0.500   |
-------------
10x10 repeated-CV: µScore: 0.497, σScore: 0.010
The elapsed time is 61.308250 seconds.
Permutation testing is running! Permutations: 1000 

The confusion matrix also showed that the predictions were not reflect correct true positive and true negative values image image

To note: We have not modified the default settings or data in any significant way. The issue persists even after repeating the analysis on different computers with different OS, Matlab versions with different parameters.

Given our results we have the following questions:

Is there a known issue with the current version of NBS-Predict that might be causing this behaviour? Could there be an issue with the data, the choice of model/hyperparameters, or a potential bug in the software? Any assistance or guidance you could provide would be greatly appreciated.

Best regards, Javier

eminSerin commented 8 months ago

Hi Javier,

Thanks a lot for the very detailed issue. That's because you might use the wrong dataset for this classification analysis, or your contrast vector is incorrect. Can you provide more information about which dataset you used and the contrast vector?

Please have a look at the following preprint for a more detailed tutorial on NBS-Predict: https://osf.io/preprints/osf/cfm7j

Cheers!

jurriolayak commented 8 months ago

Hi Emin,

Thank you very much for your prompt reply. We have tried to troubleshoot the issue with multiple users. We downloaded the dataset from the NITRC link on the GitHub page following this link: https://www.nitrc.org/docman/view.php/1517/179438/

The contrast used is: [-1,1]. Below is the image of the setup window image

Also the contrast vector seems correct in the design.mat file as seen in the picture below image

Please feel free to let us know if there is something that we missed, Best,

Javier

eminSerin commented 8 months ago

Hi Javier,

Thanks for the information. Now I see the problem. The real contrast is Group 1 > Group 2, meaning the contrast vector must be [1, -1] instead. Since I did not implement any effect of interest in Group 2, it's expected to see prediction at the level of chance. Please change the contrast vector and re-run it again. You will notice a prediction performance of ~ 0.74.

jurriolayak commented 8 months ago

Hi Emin, Thank you very much again for your comments. We followed them using the contrast [1,-1] as suggested (please see images below). The results differed from the previous contrast, but it seems that it does not match the same results as in the documentation. The confusion matrix also hits that there is something that we are doing that is not correct, but cannot figure it out the problem. I have attached the images below. We will greatly appreciate your help. Please don't hesitate to ask for additional information if you require to replicate this issue.

Setup image

Learning stage image

Outcome image

Thank you,

Javier

eminSerin commented 8 months ago

Hi Javier,

MinMaxScaler is the primary reason for the inconsistent outcomes you are experiencing. Scaling may not always improve the results, and in some cases, it may even worsen them. Therefore, I strongly advise you to either skip the scaling altogether or explore alternative classifiers such as LDA. Moreover, Bayesian Optimization is not beneficial in this situation. Therefore, I suggest you use the Grid Method instead.

Good luck!

Best of luck!