biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.81k stars 1k forks source link

[FIX] PCA: ensure tests pass on sklearn 1.4 and 1.5, which can return different results #6821

Closed pavlin-policar closed 3 months ago

pavlin-policar commented 3 months ago
Issue

6815 broke a few things that went under the radar, primarily due to scikit-learn 1.5.0 changes.

Includes
codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.26%. Comparing base (8831a57) to head (b9625da).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6821 +/- ## ========================================== - Coverage 88.26% 88.26% -0.01% ========================================== Files 326 326 Lines 71156 71158 +2 ========================================== + Hits 62805 62806 +1 - Misses 8351 8352 +1 ```
pavlin-policar commented 3 months ago

@markotoplak As far as I can tell, none of the remaining failures have anything to do with https://github.com/biolab/orange3/pull/6815. This should be good to merge.

pavlin-policar commented 3 months ago

For posterity's sake: PCA can produce different results between scikit-learn versions 1.4.0 and 1.5.0 in that the directions of the vectors may be flipped. While they do ensure a consistent output within each version of scikit-learn using their svd_flip function here, the u_based_decision was changed to False in 1.5.0, while it was True (the default) in prior versions of scikit-learn. Here is the commit that makes this change.

This change, which causes inconsitencies between scikit-learn 1.4.0 and 1.5.0 was not discussed in their PR, but it was apparently necessary to make the outputs of PCA consistent between the different solvers in 1.5.0 (original PR and comment here).