barthelemymp / TULIP-TCR

GNU General Public License v3.0
10 stars 3 forks source link

ranks dont correlate with scores #2

Closed phbradley closed 9 months ago

phbradley commented 10 months ago

Hi there,

Thanks for creating this code and making it available! I've been playing around with the notebook and it seems like the ranks don't correlate with the scores in the final output (I added the score column to see what they looked like). I think the issue may be with the line

ranks = np.argsort(scores[::-1])

I wonder if you might instead want something like

ranks = np.argsort(np.argsort(scores)[::-1])

In [152]: scores = np.random.rand(5)
In [154]: ranks = np.argsort(scores[::-1]) # the old way
In [155]: print(sorted(zip(scores, ranks)))
[(0.56589701597093, 2), (0.6182374290371392, 4), (0.6500388691277542, 1), (0.7304254978225894, 0), (0.8584390651918598, 3)]
In [156]: ranks = np.argsort(np.argsort(scores)[::-1]) # the new way
In [157]: print(sorted(zip(scores, ranks)))
[(0.56589701597093, 4), (0.6182374290371392, 3), (0.6500388691277542, 2), (0.7304254978225894, 1), (0.8584390651918598, 0)]

Sorry if I'm mixed up about how the code works. I just started playing with things! Also I'm not 100% sure which way you want to sort the scores; this assumes higher scores are better.

Take care, Phil

PS. It looks like the same issue may be present in predict.py too.

https://math.stackexchange.com/questions/3607762/why-does-sorting-twice-produce-a-rank-vector

barthelemymp commented 9 months ago

Hi Phil,

Thank you very much for comment. Indeed you are right, this a mistake. I only added the ranks last minute for the sake of the github. Thanks a lot, it should be corrected now.

Barthelemy

phbradley commented 9 months ago

Great-- thanks for looking into it!