Open nialov opened 1 week ago
@nmaarnio could you check whether there is anything wrong with the computation codes of this or redirect to the source team?
In principle, the components are the unit eigenvectors; together these comprise the transformation matrix used to compute the coordinates of the data in the 'PC' rotated space. The order matters because each is a coefficient of transformation in the corresponding dimension. And if their values are identical, a sign inversion could imply rotation in opposite direction, unless all components have reversed signs.
The current test data shows perfect linear correlation, the solution for second component becomes inconsequential because the corresponding eigenvalue is 0. Perhaps the test data could be such that it is not 'x = y' kind of data, then the covariance matrix will be a nonsingular matrix.
Both [[ 0.707, 0.707], [0.707, -0.707]]
and [[ 0.707, 0.707], [-0.707, 0.707]]
are valid solutions for this test data. It's a matter of convention that the functions follow.
I suggest for test functionstest_pca_with_nodata_removal()
and test_pca_with_nan_removal()
the expected_component_values be changed to expected_component_values = np.array([[0.707, 0.707], [0.707, -0.707]])
. That will add consistency to the code and resolve this issue. However, it should be checked with some varied test data that has a unique solution.
I suggest for test functions
test_pca_with_nodata_removal()
andtest_pca_with_nan_removal()
the expected_component_values be changed toexpected_component_values = np.array([[0.707, 0.707], [0.707, -0.707]])
. That will add consistency to the code and resolve this issue. However, it should be checked with some varied test data that has a unique solution.
Note that the tests run successfully in the conda environment defined in this repository. So if we change the expected_component_values
, the tests here will fail instead but tests in the feedstock repository will succeed. Should the convention be strict? Or is it okay/expected for the user to get these values in different order?
If the latter, I can disable the tests in the feedstock and make a pull request here that relaxes the tests to reflect this.
If the former, should the function be changed to better reinforce the order?
Hi @chudasama-bijal and @nialov. I am busy with other work for some time and I have many things in my EIS TODO list already. Could somebody else check this and/or make the decision? I don't think my opinion should weigh any more than someone else's in this
@nmaarnio Yes, we can handle it!
Describe the bug
In continous integration tests of
eis_toolkit
feedstock,test_pca_with_*
functions produce inverse results to the expected. See:Important part:
The expected is:
but the result is:
Environment details
I am wondering if there is some sort in code of
pca
that might cause different results or if the order of the results matter?