Closed alexvpickering closed 5 years ago
Whoops, you're right. Unfortunately, I won't be able to test this new code (the original code worked, surprisingly), but thank you for pointing out my error.
The calculation of ndcg_diffs
may be flawed in other respects too.
I think it is suppose to be the difference between the predicted NDCG and the NDCGs where the ranks of each (i, j) document pair are swapped but all other ranks stay the same.
This repo looks like it may be closer.
I think it is suppose to be the difference between the predicted NDCG and the NDCGs where the ranks of each (i, j) document pair are swapped but all other ranks stay the same.
That's what I'm calculating. As I mention in the code, I'm assuming binary relevance. Because the denominator for the normalized discounted cumulative gain (NDCG) won't change following a ranking swap, we can ignore it until the end (I changed the name of the ndcg_diffs
variable to dcg_diffs
to make it a little more clear what's being computed right there). The discounted cumulative gain (DCG) is a summation of terms corresponding to each document, where a document i
's term is equal to (2 ^ rel_i - 1) / log2(rank_i + 1)
. As a result, swapping the ranks of two documents only changes the values of the corresponding two terms in the summation. Therefore, swapping two relevant documents or two irrelevant documents will result in the same DCG, and so the difference between those two NDCGs will be zero (i.e., they don't need to be computed). Because the value of a term associated with an irrelevant document is zero (i.e., (2 ^ 0 - 1) = 0
), the difference in the DCGs following a ranking swap between a relevant and an irrelevant document is simply the difference between the value of the term for the relevant document in its original ranking and the value of the term for the relevant document in its "irrelevant" ranking.
1 / (1 + doc_ranks[:n_rel]).log2()
are the terms corresponding to the relevant documents in their original rankings.
1 / (1 + doc_ranks[n_rel:]).log2()
are the terms corresponding to the relevant documents in their swapped rankings.
You'll notice the dcg_diffs
tensor has a shape of (n_rel, n_irr)
(i.e., it holds the DCG differences for all non-zero swaps). The reason I did it the way I did is because it's vectorized, which will run much faster than a for loop (which is what the code you linked to uses). In fact, if you look at this commit, I was originally using a for loop (and had the logarithm in the right place; that explains why I remember it working!).
Thank you for the clarification, by binary relevance do you mean that each document is either relevant or irrelevant, but no ranks are assigned? So ranks would be of form [1, 0, 1, 0, 0]
instead of [3, 1, 2, 0, 4]
?
The relevance is the human label for the document while the rank is where the document ends up after being scored by the model. To be a little more concrete about the calculations, imagine we provide a scoring model M
with a query Q
and it returns a set of ten documents in the following order: (D_6, D_0, D_7, D_4, D_8, D_5, D_9, D_1, D_3, D_2)
. A human could then go through each document and assign it a relevance of (in the binary case) one or zero for the given Q
(an alternative labeling strategy would be to have several relevance levels, e.g., "very relevant", "somewhat relevant", and "irrelevant"). Once we have this information, we can calculate the DCG for this result set and each D_i
will contribute a term of:
dcg_i = (2 ^ rel_i - 1) / log2(rank_i + 1)
to the summation. If we swap D_0
and D_3
and calculate the new DCG, all of the dcg_i'
terms except for dcg_0'
and dcg_3'
will stay the same, so those terms will cancel when taking the difference (this is true whether we use binary relevance or not). As a result, what we have left is:
delta_DCG = (dcg_0 + dcg_3) - (dcg_0' + dcg_3')
If we pretend like both documents are relevant, then:
dcg_0 = (2 ^ 1 - 1) / log2(2 + 1)
dcg_3 = (2 ^ 1 - 1) / log2(9 + 1)
After swapping them, we have:
dcg_0' = (2 ^ 1 - 1) / log2(9 + 1)
dcg_3' = (2 ^ 1 - 1) / log2(2 + 1)
so:
(dcg_0 + dcg_3) - (dcg_0' + dcg_3') = 0
The same is true when you have two irrelevant documents, but it's even less interesting because their dcg_i
s are equal to zero. In the case where D_0
is relevant and D_3
is irrelevant, we have:
dcg_0 = (2 ^ 1 - 1) / log2(2 + 1)
dcg_3 = (2 ^ 0 - 1) / log2(9 + 1) = 0
dcg_0' = (2 ^ 1 - 1) / log2(9 + 1)
dcg_3' = (2 ^ 0 - 1) / log2(2 + 1) = 0
so:
delta_DCG = (dcg_0 + dcg_3) - (dcg_0' + dcg_3') = dcg_0 - dcg_0'
which is what I'm calculating.
I think it should be (with log2 on denominator only):