Fix a small issue related to numpy in metrics.py

mrrezaie commented 11 months ago

Hi, regarding https://github.com/0todd0000/spm1d/issues/262, I found that x may be a list of numpy arrays, e.g., [array([0.30227087])] and max(x) returns a list, e.g., [0.30227087].

That was the source of the issue. Using numpy.max instead of max can handle it and return a value.

mrrezaie commented 11 months ago

Sorry, my first attempt was tested with 1000 iterations. When I used all iterations, I got another similar error and again related to x. Either max or numpy.max cannot handle this: [array([0.0042]), 0.1038]. These changes convert it to a flat list and now it is working.

The output of numpy.trapz is an ndarray object when the input is n-dimentional. https://github.com/0todd0000/spm1d/blob/3636afb69f7a675e6e968db2093474dd85952e1f/spm1d/stats/nonparam/metrics.py#L53-L58 If this is always going to be a value, changes can be simply made in line 58 instead of the above.

return float(x)

0todd0000 commented 11 months ago

Hi, thank you for the fix! But I don't quite understand the problem. Using the data you sent in #262 I can't reproduce the error. Here is my script:

import numpy as np
import spm1d

fpath  = os.path.join(  os.path.dirname(__file__), 'data.npz' )
with np.load( fpath ) as z:
    yA = z['con1']
    yB = z['con2']

tn  = spm1d.stats.nonparam.ttest_paired(yA, yB)
tni = tn.inference(0.05, two_tailed=True, force_iterations=True)

This script runs without errors.

The fix you suggest may indeed be a good idea, but I'd like to understand the problem a bit more clearly before merging this pull request if that's OK?

mrrezaie commented 11 months ago

Sorry, could you please tell me the version of numpy you are currently using?

mrrezaie commented 11 months ago

Please see this post. I speculate NumPy version is the reason you didn't get the error.

I also double-checked the code and found that the issue is here indeed: https://github.com/0todd0000/spm1d/blob/3636afb69f7a675e6e968db2093474dd85952e1f/spm1d/stats/nonparam/metrics.py#L55

It returns an array not a value. So, I reverted all the above changes and just modified this line. Please let me know your thought.

0todd0000 commented 11 months ago

Sorry, could you please tell me the version of numpy you are currently using?

1.22.4

Either max or numpy.max cannot handle this: [array([0.0042]), 0.1038]

I do not see how either np.max or [array([0.0042]), 0.1038] relate to get_single_cluster_metric; does this statement pertain to get_max_metric, as in for example: max( [np.array([11]), 12] )?

The output of numpy.trapz is an ndarray object when the input is n-dimentional.

I agree, but consider np.trapz( z[i]-thresh ); here z is a 1D float array, i is a 1D boolean array, and thresh is a scalar. So I don't understand how the input can be more than 1D for for this function call.

mrrezaie commented 11 months ago

Sorry for the confusion, my bad. I'm pretty sure that if you upgrade your numpy, you will get the error.

Traceback (most recent call last):                                                                                                                              
  Cell In[3], line 1                                                                                                                                            
    ti  = spm1d.stats.nonparam.ttest_paired(yA, yB).inference(0.05, two_tailed=True, iterations=10000, force_iterations=True)                                   
  File f:\python38\lib\site-packages\spm1d\stats\nonparam\_snpm.py:253 in inference                                                                             
    clusters   = self._cluster_inference(clusters, two_tailed)                                                                                                  
  File f:\python38\lib\site-packages\spm1d\stats\nonparam\_snpm.py:230 in _cluster_inference                                                                    
    cluster.inference(self.permuter.Z2, two_tailed)                                                                                                             
  File f:\python38\lib\site-packages\spm1d\stats\_clusters.py:191 in inference                                                                                  
    pdf      = np.asarray(pdf, dtype=float)  # fix for ragged nested sequences                                                                                  
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10000,) + inhomogeneous part.

I do not see how either np.max or [array([0.0042]), 0.1038] relate to get_single_cluster_metric; does this statement pertain to get_max_metric, as in for example: max( [np.array([11]), 12] )?

Yes, in the get_max_metric. That was my first guess which was incorrect.

I agree, but consider np.trapz( z[i]-thresh ); here z is a 1D float array, i is a 1D boolean array, and thresh is a scalar. So I don't understand how the input can be more than 1D for for this function call.

I agree. Again my incorrect guess. Did you see my last comment and commit? https://github.com/0todd0000/spm1d/pull/263#issuecomment-1641586800 https://github.com/0todd0000/spm1d/pull/263/commits/2f79a4e32ad633ebf0d242811d626764b40e72e2

Let me explain the issue in more detailes: From the error, it is apparent that the issue is in the following line. That's because the input is a list of different objects, e.g., int, float, numpy.float64, numpy.ndarray, and the new numpy version rises the error. https://github.com/0todd0000/spm1d/blob/3636afb69f7a675e6e968db2093474dd85952e1f/spm1d/stats/_clusters.py#L190-L191

Please check line 55. The output of z[i] is an numpy.ndarray and hence, x will be an numpy.ndarray. On the other hand, in line 35, the output might be either a value or a numpy.ndarray. Please see the following examples:

In [16]: max([3.8, 0.4])
3.8
In [17]: max([3.8, np.array([0.4])])
3.8
In [18]: max([np.array([0.2])])
array([0.2])
In [19]: max([3.8, np.array([4.4])])
array([4.4])

So, the simplest fix is to find the line that yields a numpy.ndarray object. If you convert it to a float, the issue will be resolved. https://github.com/0todd0000/spm1d/commit/2f79a4e32ad633ebf0d242811d626764b40e72e2

0todd0000 commented 11 months ago

Understood, the line 55 fix sounds correct: the get_single_cluster_metric method should return a float for all _Metric subclasses. The other _Metric classes do this, but I agree that z[i] - thresh in the MaxClusterIntegral class returns an array.

0todd0000 commented 11 months ago

I've merged the edit into the master branch and updated the package on PyPi. Thank you very much for finding and fixing this bug!!

mrrezaie commented 11 months ago

I'm glad I could help; thank you.

0todd0000 / spm1d

Fix a small issue related to numpy in metrics.py #263