airalcorn2 / RankNet

My (slightly modified) Keras implementation of RankNet and PyTorch implementation of LambdaRank.
MIT License
247 stars 47 forks source link

Possible error in ndcg_diffs calculation #4

Closed alexvpickering closed 5 years ago

alexvpickering commented 5 years ago

I think it should be (with log2 on denominator only):

ndcg_diffs = (1 / (1 + doc_ranks[:n_rel]).log2()) - (1 / (1 + doc_ranks[n_rel:]).log2()).view(n_irr)
airalcorn2 commented 5 years ago

Whoops, you're right. Unfortunately, I won't be able to test this new code (the original code worked, surprisingly), but thank you for pointing out my error.

alexvpickering commented 5 years ago

The calculation of ndcg_diffs may be flawed in other respects too.

I think it is suppose to be the difference between the predicted NDCG and the NDCGs where the ranks of each (i, j) document pair are swapped but all other ranks stay the same.

This repo looks like it may be closer.

airalcorn2 commented 5 years ago

I think it is suppose to be the difference between the predicted NDCG and the NDCGs where the ranks of each (i, j) document pair are swapped but all other ranks stay the same.

That's what I'm calculating. As I mention in the code, I'm assuming binary relevance. Because the denominator for the normalized discounted cumulative gain (NDCG) won't change following a ranking swap, we can ignore it until the end (I changed the name of the ndcg_diffs variable to dcg_diffs to make it a little more clear what's being computed right there). The discounted cumulative gain (DCG) is a summation of terms corresponding to each document, where a document i's term is equal to (2 ^ rel_i - 1) / log2(rank_i + 1). As a result, swapping the ranks of two documents only changes the values of the corresponding two terms in the summation. Therefore, swapping two relevant documents or two irrelevant documents will result in the same DCG, and so the difference between those two NDCGs will be zero (i.e., they don't need to be computed). Because the value of a term associated with an irrelevant document is zero (i.e., (2 ^ 0 - 1) = 0), the difference in the DCGs following a ranking swap between a relevant and an irrelevant document is simply the difference between the value of the term for the relevant document in its original ranking and the value of the term for the relevant document in its "irrelevant" ranking.

1 / (1 + doc_ranks[:n_rel]).log2() are the terms corresponding to the relevant documents in their original rankings. 1 / (1 + doc_ranks[n_rel:]).log2() are the terms corresponding to the relevant documents in their swapped rankings.

You'll notice the dcg_diffs tensor has a shape of (n_rel, n_irr) (i.e., it holds the DCG differences for all non-zero swaps). The reason I did it the way I did is because it's vectorized, which will run much faster than a for loop (which is what the code you linked to uses). In fact, if you look at this commit, I was originally using a for loop (and had the logarithm in the right place; that explains why I remember it working!).

alexvpickering commented 5 years ago

Thank you for the clarification, by binary relevance do you mean that each document is either relevant or irrelevant, but no ranks are assigned? So ranks would be of form [1, 0, 1, 0, 0] instead of [3, 1, 2, 0, 4]?

airalcorn2 commented 5 years ago

The relevance is the human label for the document while the rank is where the document ends up after being scored by the model. To be a little more concrete about the calculations, imagine we provide a scoring model M with a query Q and it returns a set of ten documents in the following order: (D_6, D_0, D_7, D_4, D_8, D_5, D_9, D_1, D_3, D_2). A human could then go through each document and assign it a relevance of (in the binary case) one or zero for the given Q (an alternative labeling strategy would be to have several relevance levels, e.g., "very relevant", "somewhat relevant", and "irrelevant"). Once we have this information, we can calculate the DCG for this result set and each D_i will contribute a term of:

dcg_i = (2 ^ rel_i - 1) / log2(rank_i + 1)

to the summation. If we swap D_0 and D_3 and calculate the new DCG, all of the dcg_i' terms except for dcg_0' and dcg_3' will stay the same, so those terms will cancel when taking the difference (this is true whether we use binary relevance or not). As a result, what we have left is:

delta_DCG = (dcg_0 + dcg_3) - (dcg_0' + dcg_3')

If we pretend like both documents are relevant, then:

dcg_0 = (2 ^ 1 - 1) / log2(2 + 1)
dcg_3 = (2 ^ 1 - 1) / log2(9 + 1)

After swapping them, we have:

dcg_0' = (2 ^ 1 - 1) / log2(9 + 1)
dcg_3' = (2 ^ 1 - 1) / log2(2 + 1)

so:

(dcg_0 + dcg_3) - (dcg_0' + dcg_3') = 0

The same is true when you have two irrelevant documents, but it's even less interesting because their dcg_is are equal to zero. In the case where D_0 is relevant and D_3 is irrelevant, we have:

dcg_0 = (2 ^ 1 - 1) / log2(2 + 1)
dcg_3 = (2 ^ 0 - 1) / log2(9 + 1) = 0
dcg_0' = (2 ^ 1 - 1) / log2(9 + 1)
dcg_3' = (2 ^ 0 - 1) / log2(2 + 1) = 0

so:

delta_DCG = (dcg_0 + dcg_3) - (dcg_0' + dcg_3') = dcg_0 - dcg_0'

which is what I'm calculating.