Question about the Z-scores calculated from estimate() and predict() when using HBR

YCHuang0610 commented 6 months ago

Hello, I found that when calculating Z-scores for same subject using estimate() and predict() functions (alg=hbr), different results are obtained. I check the souce code and found the implement of Z-scores from estimate() and predict() functions are not the same when using HBR. Line 556-557 and 884: https://github.com/amarquand/PCNtoolkit/blob/master/pcntoolkit/normative.py

I am wondering which result should I use to indecate the deviation of such subject.

Thank you.

=============================================

Code for estimate:

os.chdir("Train_test/")
ptk.normative.estimate(covfile="../Data/X_train.txt",
                       respfile="../Data/Y_train.txt",
                       trbefile="../Data/trbefile.txt",
                       alg='hbr',
                       log_path="logs/",
                       cores=12,
                       output_path="Models", 
                       testcov="../Data/X_test.txt",
                       testresp ="../Data/Y_test.txt",
                       tsbefile="../Data/tsbefile.txt",
                       outputsuffix="_train_test",
                       inscaler='standardize',
                       outscaler='standardize',
                       linear_mu='True',
                       random_intercept_mu='True',
                       centered_intercept_mu='True',
                       saveouput=True,
                       savemodel=True)
os.chdir("..")

Code for predict:

# the X_Control.txt and X_test.txt contain common subjects.
os.chdir("Control")
ptk.normative.predict(covfile='Data/X_Control.txt',
                      respfile='Data/Y_Control.txt',
                      tsbefile='Data/Control_befile.txt',
                      alg='hbr',
                      inputsuffix='traintest',
                      model_path='../Train_test/Models/',
                      outputsuffix='_Control')
os.chdir("..")

smkia commented 6 months ago

Hi,

First, your observation is true and the reason as you have mentioned is that we have not yet implemented the MCMC-based z-scoring for the predict function (it is in the TODO list as you can see in the code). Here are some points:

If you use a Gaussian likelihood (which seems to be the case in your code), then the z-scores from estimate and predict might be different yet very close (isn't this the case?). Otherwise, I would rely more on the results of the estimate function.
For non-Gaussian likelihood, the z-scores from the predict function are not reliable. The issue will be solved in the new release.

I hope these are helpful.

YCHuang0610 commented 6 months ago

Hi,

First, your observation is true and the reason as you have mentioned is that we have not yet implemented the MCMC-based z-scoring for the predict function (it is in the TODO list as you can see in the code). Here are some points:

If you use a Gaussian likelihood (which seems to be the case in your code), then the z-scores from estimate and predict might be different yet very close (isn't this the case?). Otherwise, I would rely more on the results of the estimate function.

For non-Gaussian likelihood, the z-scores from the predict function are not reliable. The issue will be solved in the new release.

I hope these are helpful.

Hi smkia,

Thanks for your reply. I have checked the output z-scores from estimate and predict. Although they are highly positive correlated, the range of them are very different. Is this a case or I should regenerate all the Z scores using the estimate function? 微信截图_20240304113205

Thank you.

smkia commented 6 months ago

Yes, indeed the ranges are different. It is very difficult to guess what happening here, but I suspect it should be related to data rescaling. We need to check this more carefully. Do you rescale data BTW?

Otherwise, I have implemented the MCMC-based z-scoring for the predict function and it is now available on the dev branch: https://github.com/amarquand/PCNtoolkit/commit/1fb27044c3cf6e7340f77352c9977eab6f21c851. You need to manually install the latest version on the dev branch before you can use it (it is not yet released and thus not available on PyPI).

YCHuang0610 commented 6 months ago

Yes, indeed the ranges are different. It is very difficult to guess what happening here, but I suspect it should be related to data rescaling. We need to check this more carefully. Do you rescale data BTW?

Otherwise, I have implemented the MCMC-based z-scoring for the predict function and it is now available on the dev branch: 1fb2704. You need to manually install the latest version on the dev branch before you can use it (it is not yet released and thus not available on PyPI).

Thank you for your reply, I didn't rescale my data before input the ptk.normative.estimate and ptk.normative.predict function. But I did use the inscaler and outscaler options in ptk.normative.estimate function. I don't know if that is the case. Thanks for the timely update for the predict function and I'll try the dev branch :)

amarquand / PCNtoolkit

Question about the Z-scores calculated from estimate() and predict() when using HBR #199