gfxdisp / asap

Fast and accurate Active SAmpling method for Pairwise comparisons
MIT License
42 stars 15 forks source link

Repetition of suggested pairs #9

Closed eric-stumpe closed 3 years ago

eric-stumpe commented 3 years ago

Hello Aliaksei,

I am currently working on a project that deals with the pairwise ranking. Therefore, I really appreciate that you put up your code on this Github repository.

The issue that I am having is that for a simulated run, the suggested pairs ("pairs_to_compare") start to repeat and the computed scores stop to converge to the ground truth ranking. I constructed the following minimal example from "example_running.ipynb"

import numpy as np
import random
import asap_cpu

def update_pwc_mat(x,y,scores,pwc_mat):     #bigger score, sample wins

    x_score = scores[x]
    y_score = scores[y]

    if(x_score>y_score):
        pwc_mat[x,y] = pwc_mat[x,y]+1     
    else:
        pwc_mat[y,x] = pwc_mat[y,x]+1

    return pwc_mat

random.seed(7)
num_samples = 10

pwc_mat = np.zeros((num_samples,num_samples)).astype(np.uint8)

scores = list(range(0, num_samples))
random.shuffle(scores) 
print(scores)  # for this seed, scores = [8, 3, 1, 4, 7, 0, 9, 6, 2, 5] --> sample 0: score of 8, sample 1: 3, etc.

num_batches = 10 #number of times "pairs_to_compare" is called to suggest new points

for i in range(num_batches):
    asap = asap_cpu.ASAP(num_samples, selective_eig = True)

    pairs_to_compare = asap.run_asap(pwc_mat, mst_mode = True)

    # Get current score estimation
    scores_mean, scores_std = asap.get_scores()

    print(i)

    for j in range(pairs_to_compare.shape[0]):
        pwc_mat = update_pwc_mat(pairs_to_compare[j,0],pairs_to_compare[j,1],scores,pwc_mat)  # pwc_mat is updated with all suggestions of recent "pairs_to_compare"

print(pwc_mat)   

For example when num_batches = 10; ten times new points are suggested and the pwc matrix is updated accordingly, the pwc_mat is:

[[0 1 1 1 1 1 0 2 1 1]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0]
 [0 0 3 0 0 0 0 1 0 1]
 [0 0 0 0 0 0 0 0 0 0]
 [9 9 9 9 6 9 0 0 0 0]
 [0 0 7 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 7 1 0 0 0 0 8 0]]

and the estimated scores:

scores_mean = array([ 0.69219759, -0.38563688, -1.29441194, -0.49231944,  0.2653247 , -0.38563688,  1.92811414,  0.04246648, -1.02940521,  0.40976086])

scores_mean.argsort().argsort() = array([8, 3, 0, 2, 6, 4, 9, 5, 1, 7], dtype=int64)

Do you know from where this issue could stem from? (Maybe I am just doing something wrong) Many thanks in advance!

Sincerely,

Eric

mikhailiuk commented 3 years ago

Hi Eric,

Thanks for using the code, the question and code snippet. I am a bit busy till the end of the week, but will give it a detailed look at the beginning of the next week. Check the thread, I will get back with the results here!

Thanks,

Aliaksei

eric-stumpe commented 3 years ago

Hi Aliaksei,

thank you very much!

Sincerely,

Eric

mikhailiuk commented 3 years ago

Hi @WolfRoyal ,

Thanks for the wait and for raising the issue!

There was indeed a bug - in how I was calculating probability weights for selective EIG computations and it has messed up the selected pairs. All fixed now.

Please let me know if there is anything else :)

Best,

Aliaksei

eric-stumpe commented 3 years ago

Hi Aliaksei,

Perfect! Thank you very much!

Best,

Eric