What is this rank aggregation algorithm useful for?

drbh / ranking-rs

Calculate average or median ranking from many samples or ranked data

6 stars 0 forks source link

What is this rank aggregation algorithm useful for? #1

Open makoConstruct opened 4 years ago

makoConstruct commented 4 years ago

I'm very interested in ranking aggregation. I am a bit confused by this one. I tried it on the ranking set [a b] [a b] [a b] [a b] [b c] [c d] and neither Mean or Median gives c a higher score than b, even though the only judgement in the set about b and c is saying that c is better than b. Does it not do very well with partial rankings? I'm curious as to what this would be used for.

drbh commented 4 years ago

@makoConstruct

Thanks for checking out the project. However, when I run the example above I get back the expected median and mean scores.

Code

use ranking::{calculate_ranking, Metric};

fn main() {
    let ranking_a = vec!["a", "b"];
    let ranking_b = vec!["a", "b"];
    let ranking_c = vec!["a", "b"];
    let ranking_d = vec!["b", "c"];
    let ranking_e = vec!["c", "d"];

    // names are names of oceans
    let everyones_rankings = vec![
        ("Artic", ranking_a),
        ("Southern", ranking_b),
        ("Pacific", ranking_c),
        ("Atlantic", ranking_d),
        ("Indian", ranking_e),
    ];

    // here we use the median to calculate the final ranking
    let m_metric = Metric::Median;
    let rankings_by_median = calculate_ranking(everyones_rankings.clone(), m_metric);
    println!("{:#?}", rankings_by_median);

    // here we use the average to calculate the final ranking
    let a_metric = Metric::Average;
    let rankings_by_average = calculate_ranking(everyones_rankings.clone(), a_metric);
    println!("{:#?}", rankings_by_average);
}

Results

I get two sets of results from this script.

Ranking Metric: Median
[
    ( 0.0,  "a"),
    ( 1.0,  "b"),
    ( 1.0,  "c"),
    ( 1.0,  "d"),
]

Ranking Metric: Average
[
    ( 0.0,  "a"),
    ( 0.5,  "c"),
    ( 0.75, "b"),
    ( 1.0,  "d"),
]

Breakdown

The response above contain a sorted list from the top-ranked to the lowest-ranked items. The score (which is the numerical value in the tuple) is calculated by the metric passed to the ranker. (Median, Mean)

Median

In the first case, the score is the Median value, which if you do by hand you should get the same results.

a was in 0th position 3 times and the Median of [0,0,0] is 0.

Where c was in 0th position and 1st position and the Median of [0,1] is calculated as 1.

The median case may return semi unexpected results with such small inputs, it's unintuitive that c, b and d all have a median of 1, but a result of using such short ranking lists.

Mean

Above you mention that c does not rank higher than b but it does in this case.

The mean position for c is in 0.5th place. c was in 0th position and 1st position, the average of [0,1] is 0.5

The next value b has an average of 0.75th place. b was ranked in 0th position once and 1st position 3 times, the average of [0,1,1,1] is 0.75

The lower the score the "higher" the ranking. The same way 1st place is better than 8th place in a race.

Also note that the score is not a rank placement, and two values can have the same score in some cases which is essentially a tie

I hope this snippet and description was helpful and helps answer your question of what is this useful for.

Users 🤷‍♀️

It's useful for any case where you have many rankings from many individuals and you'd like to consolidate them to into a composite ranking.

drbh commented 4 years ago

bump (didn't mean to close the comment) please continue the conversations with any questions/comments below 😊

makoConstruct commented 4 years ago

I was confused about which end was the top and which was the bottom, but since rank is a symmetric relationship I don't think it was relevant.

Also note that the score is not a rank placement, and two values can have the same score in some cases which is essentially a tie

Does the algorithm have a methodical way of breaking ties that respects some important fact about the input data that the score doesn't? It seems like it's just alphabetical. Does any candidate ever end up lower in the list than an item with a higher score? If neither of these things are the case, score might as well be equivalent to rank.

I am happy to accept that the library implements average and median aggregation correctly, it just seems apparent to me that those are not correct formalisation of the process of taking many individual judgements about separate parts of the world and combining them into a consistent over-arching picture. For instance, in the example code, only one voter professes knowledge of how B compares to C, and none of the other judgements entangle B and C indirectly. That one judgement should then fully determine the final ranking of B and C in the aggregation, but the aggregation either disagrees with (in the Average case), or underappreciates (making B and C a tie in the Median rank.) the B-C judge.

It may be that my intuition that there is an objectively correct way to aggregate this sort of ranking is provably wrong, that would be good to know, but it wouldn't stop me. There is a need for an approximate analogue to the ideal I'm imagining. I'm not sure what idea these algorithms are supposed to model, but it doesn't seem to very closely resemble my understanding of the combination of rank judgements.

drbh commented 4 years ago

🙏 thanks again for the questions.. I've thought and researched about the discussion above.

Clarity

Firstly, I think we have some miscommunication and need to clarify the definition of ranking and subsequently ranking aggregations.

Your quote below mentions a consistent over-arching picture

those are not correct formalisation of the process of taking many individual judgements about separate parts of the world and combining them into a consistent over-arching picture.

which suggest that the act of ranking and aggregating those rankings are based on the aggregate value of pairwise comparisons. In other words we want to summarize all of the relationships we are passed.

In your above example A->B B->C and C->D. Using the "relationship based" ranking aggregation, we should infer that A->D, sine A->B->C->D right?

The metrics used in the library at the moment are based on a totally different concept of ranking; let's call it "positional based".

Positional based ranking cares about the position of an item in a list of rankings. In this case, our goal is to summarize the positions of all elements and not to construct a greater image from many rankings.

Positional ranking makes sense if you are collecting votes from a competition or race. It answers questions like: "Who's average ranking was closest to first?"

Relationship ranking makes more sense if the position or a rank doesn't matter. It answers questions like: "What team is more likely to win?" where the data provided are a list of past games and their outcomes.

Next steps

Your comments were enlightening and strike an important point. Ranking needs context to make sense of what it means. The concept of "ranking things" is also the whole world or document retrieval, machine learning and large scale optimizations which this project was not thinking about when it was started.

My immediate next step is to define ranking on the README and split ranking algorithms into larger groups.

The "positional" ranking seems like it is a version of "pointwise ranking" and the "relationship" ranking seem like "pairwise ranking" but I cannot confirm this yet.

Once there is a way to indicator your "ranking", I'll implement a naive ranking algorithm that expands pairwise relationships into a graph and ranks based on the longest path path.

I am really interested in following the naive approach with the Bradley Terry model and ML models like LambdaRank and RankNet.

Help wanted

This is a little project with big goals. I would really appreciate any help, PRs and ideas. As you can see ranking can move in many directions and I'd like to cover many of those concepts with some solid Rust 🦀

@makoConstruct Please let me know your thought and if you'd like to help!

makoConstruct commented 4 years ago

In your above example A->B B->C and C->D. Using the "relationship based" ranking aggregation, we should infer that A->D, sine A->B->C->D right?

Yes

Positional ranking makes sense if you are collecting votes from a competition or race.

Situations where each ranker has seen all of the competitors and has strong opinions about how they compare? I am constantly frustrated by how many large competitions act as if they are that sort of thing, and they're actually not. The judges frequently lack familiarity with some of the contestants completely and shouldn't be placing them on their rankings at all, or are indifferent between them, and should be placing them equal (it would be possible to place some candidates equal while still relating them to others, under a relational ranking, it would not be possible with a positional ranking).

I'm glad to hear that you intend on growing the package. I think I would like to help, but it's not really my area, my math isn't very strong. But rank aggregation is something I'd very much like to use to make new kinds of voting systems, and if no one else steps up, I might have to. I find that I keep returning to it.

I suppose I have a general interest in formalisations of preference aggregation in how it relates to metaethics. There is a sense in which aggregating ranking graphs is one way of conducting utilitatarianism; combining the value systems/preferences of many different agents together, and finding out what they all would agree is good. Things like rank aggregation seem like they might be useful in that.

drbh commented 4 years ago

@makoConstruct

Hey just pushed a new function to the repo that ranks your example using a Bradley Terry model. (it now returns the expected rank based on relationships and not position)

use ranking::bt;

fn main() {
    let pairs = vec![
        ("a", "b"), 
        ("b", "c"), 
        ("c", "d"), 
    ];

    let est_victories  = bt(pairs.clone());
    println!("{:?}", est_victories);
    // [(0.37530792, "a"), (0.20661964, "b"), (0.15442751, "c"), (0.085017495, "d")]
}

Check it out and let me know your thoughts! 🙌

Toy example?

I hear you on all the issues with positional ranking you mentioned above, I would be interested in hearing about a possible voting system and way to aggregate votes "fairly". Do you have a toy example we can start exploring?

Thoughts

I alos have a

general interest in formalisations of preference aggregation in how it relates to metaethics

all this makes me think back to the utility models we would make in grad school (for marketing optimizations). Maybe the world of conjoin analysis and part worth utility can be applied?

makoConstruct commented 4 years ago

The rank aggregation algorithm I'm looking for would behave in the following ways.

Thick lines represent edges with double strength. On the right side are how each preference graph would determine the ordering of the scores of the vertices.

preference graph examples

I should write some unit tests.

makoConstruct commented 4 years ago

I have written some unit tests https://gist.github.com/makoConstruct/414e3706d478b9856a7d3319f5aed35e

drbh commented 4 years ago

@makoConstruct

Great work!! I've added the tests to the library and with a little effort (converting from score to rank) and one change to the test, all the tests are passed using the BT model.

Note on the change

In the final test - which seems very simple the currently used algorithm gets a different result. C is ranked better than B since C has won more overall games. I don't mean to change the heuristics in your unit test but maybe this is the "fairer" ranking?

If this is not the case, I will be very happy to change the unit test back to your original spec, and I have an idea of a more simple path algorithm that should pass the tests.

However I do want to point out that the current BT model can handle (score) disconnected graph rankings while a more simple path traversal model would error out on disconnected paths or return more then one possible ranking

makoConstruct commented 4 years ago

No, "won more times" is a different concept than what I'm trying to formalise. That test was designed to catch out that approximation. Which feels a bit hacky. I wish I could claim that these tests are exhaustive.

My stance is that any algorithm that claims to be able to work with disconnected components would not be capturing rank aggregation, because there is no coherent basis to rank things which no one has ever even indirectly compared.

If two elements are unconnected, we should not pretend to know how they compare.

The algorithm I imagine would give silly answers for disconnected components, most applications that use it would check to see if the graph is sufficiently well enough connected for a meaningful rank aggregation to be possible.

makoConstruct commented 4 years ago

I guess a full system would be sensitive to instances of ambiguity (too little data) and controversy (conflicting data)

Elsewhere in the system, there would be services that monitor voting to detect disagreement within a category of judge, or about a quality being judged, and issue reports to its constituents along the lines of, "this grouping might be chimeric, ill-defined. It might not mean what you think it means. There might not be any sane policy by which to police inclusion. It might need to be broken up into more specific groups, or replaced with a different paradigm." And then most users will ignore those reports and keep conducting their political dialogues in the same way they always have. But some of them wont ignore them, and there is a chance of things getting better.

drbh commented 4 years ago

@makoConstruct Sometimes disconnected graphs are important. In cases where all of the ranked items take part in some very similar (transferable) action.

Take ranking basketball or football teams for instance. While two team may have not played each other yet - you may be able to use information from the disconnected teams track records.

Strongest path

In your case we we want to find the longest path with the greatest weight. In other words we want to take all of the ranked pairs, expand them into a graph which we then traverse and find the strongest (longest and heaviest) path.

How to calculate

Assuming the graph is fully connected then the longest path will account for each node, however there can be more than one longest path (when two or more nodes are ambiguous as to which should be ranked higher). This happens anytime ranks is= in your above images.

If there is a ties we need to use the weight to discern which path is "stronger". The weight is the sum of the "wins" per each unique edge between nodes. The longer and heavier path will be returned as the ranking.

If we have more than one "strongest" path we'll need to combine them, here we average the position of all of the nodes in the paths and return a single rank. Since all of the longest and heaviest path have the same number of elements, the average works well to create unbiased final scores.

Passing tests

The strongest path algorithm has been added to the library along with another example. I've removed the change I made to your test criteria and all of the original tests are passed with the strongest path algorithm 🙌

Check out the tests and new example file and let me know what you think about this approach to the problem!

makoConstruct commented 4 years ago

That sounds like a promising approximation.. what's its time complexity though? For very densely connected graphs (graphs with a lot of votes), it sounds like it would equate to the travelling salesman problem.

Over here I'm trying include metrics of certainty, controversy, and maybe a separate measure of weight, in the formal definition. Maybe if I name enough reasonable components of the thing it will be more obvious how to define it. Ah. I think I'm looking for a Y-Δ transform. I had parallel and serial but realised that wasn't enough. I don't know how this might be proved, but it seems like that, along with a parallel and serial reduction is sufficient to reduce any pair of points in a comparison graph to a single path. Now all I need to do is try a bunch of different reducer functions and see if any of them test correctly.

makoConstruct commented 4 years ago

I've written a thing for testing different formalizations of pairwise comparison aggregation. I can't find any fault with the model that records ups and downs of each comparison, adds over parallel reductions, averages over serial. It might be TRUE. Like, a correct mathematical model of what comparison aggregation would be if it physically existed. I should compare it to the papers that claim to model it as bayesian error minimization.

https://gist.github.com/makoConstruct/0c835eda538cd772fcf75bd61699a627