Rankers were taking a long time to compute. While the ranker is normalizing the ranks, the code calculates the max rank of the ranks list within a comprehensive list. This means that on each element on the list, we are calculating the max -which is a constant value!- this leads to run expensive operations that can take a long time to compute when working with big matrices.
Solution
By the point in which we were calculating the max rank, we already have a sorted list. The solution was to take out of the comprehensive list the max operation and instead get the last element of the sorted list, assign it to a variable, and then use it within the comprehensive list. This solution reduces the time it takes for the ranker.
Tests
For a matrix of shape (615447, 4768), the code "as is" took 95.79 min, with the change in this request it took 0.0019 sec.
Added 3 new testing functions to the pytest suit where we check that the numpy arrays generated by the change are equal to the numpy arrays of scores generated by the previous logic implemented now on a predict_proba_deprecated function within the test. The test are specific for no ties, half ties and some ties in the ranking.
Problem
Rankers were taking a long time to compute. While the ranker is normalizing the ranks, the code calculates the
max
rank of the ranks list within a comprehensive list. This means that on each element on the list, we are calculating the max -which is a constant value!- this leads to run expensive operations that can take a long time to compute when working with big matrices.Solution
By the point in which we were calculating the
max
rank, we already have a sorted list. The solution was to take out of the comprehensive list the max operation and instead get the last element of the sorted list, assign it to a variable, and then use it within the comprehensive list. This solution reduces the time it takes for the ranker.Tests
For a matrix of shape (615447, 4768), the code "as is" took 95.79 min, with the change in this request it took 0.0019 sec.
Added 3 new testing functions to the
pytest
suit where we check that the numpy arrays generated by the change are equal to the numpy arrays of scores generated by the previous logic implemented now on apredict_proba_deprecated
function within the test. The test are specific for no ties, half ties and some ties in the ranking.