Closed quangdaist01 closed 2 years ago
import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
def scanpath2clusters(meanshift, scanpath):
string = []
xs = scanpath['X']
ys = scanpath['Y']
for i in range(len(xs)):
symbol = meanshift.predict([[xs[i], ys[i]]])[0]
string.append(symbol)
return string
def improved_rate(meanshift, scanpaths):
Nc = len(meanshift.cluster_centers_)
Nb, Nw = 0, 0
for scanpath in scanpaths:
string = scanpath2clusters(meanshift, scanpath)
for i in range(len(string)-1):
if string[i]==string[i+1]:
Nw += 1
else:
Nb += 1
return (Nb-Nw)/Nc
xs, ys = [], []
for scanpath in scanpaths:
xs += list(scanpath['X'])
ys += list(scanpath['Y'])
gt_gaze = np.concatenate((np.vstack(xs), np.vstack(ys)), axis=1)
bandwidth = estimate_bandwidth(gt_gaze)
rates = []
factors = [0.25, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0]
for factor in factors:
bd = bandwidth*factor
ms = MeanShift(bandwidth=bd)
ms.fit(gt_gaze)
rate = improved_rate(ms, scanpaths)
rates.append(rate)
rates = np.vstack(rates)
best_bd = factors[np.argmax(rates)]*bandwidth
best_ms = MeanShift(bandwidth=best_bd)
best_ms.fit(gt_gaze)
# save best_ms for evaluation
gt_strings = []
for gt_scanpath in scanpaths:
gt_string = scanpath2clusters(best_ms, gt_scanpath)
gt_strings.append(gt_string)
Sequence score with interaction rate: https://www.cv-foundation.org/openaccess/content_iccv_2013/papers/Borji_Analysis_of_Scores_2013_ICCV_paper.pdf Sequence score with improved interaction rate: https://www-users.cs.umn.edu/~qzhao/publications/pdf/jiang_tnnls16.pdf
In practice, I use the bandwidth b_estimated estimated by sklearn (can be found in example), then I try b=b_estimated*scale_i, scale_i = 0.2, 0.5, 0.8, 1.0, 1.2, 1.5, 1.8 and select the one with the highest improved interaction rate. Checkout the example: https://scikit-learn.org/stable/auto_examples/cluster/plot_mean_shift.html#sphx-glr-auto-examples-cluster-plot-mean-shift-py
Input: sequences of fixations on one image
Thank you very much! Have a great day!
Hi, I try to verify the human oracle sequence score you provide in the paper (0.490), but got way higher score of 0.678. I was able to reproduce your MultiMtach score, so the problem should not be in the data I use. I do the following:
def compute_clusters(gt_scanpaths):
xs, ys = [], []
for scanpath in gt_scanpaths:
xs += list(scanpath['X'])
ys += list(scanpath['Y'])
gt_gaze = np.concatenate((np.vstack(xs), np.vstack(ys)), axis=1)
bandwidth = estimate_bandwidth(gt_gaze)
rates = []
factors = [0.2, 0.5, 0.8, 1.0, 1.2, 1.5, 1.8] #[0.25, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0]
for factor in factors:
bd = bandwidth*factor if bandwidth > 0.0 else None
ms = MeanShift(bandwidth=bd)
ms.fit(gt_gaze)
rate = improved_rate(ms, gt_scanpaths)
rates.append(rate)
rates = np.vstack(rates)
best_bd = factors[np.argmax(rates)]*bandwidth if bandwidth > 0.0 else None
best_ms = MeanShift(bandwidth=best_bd)
best_ms.fit(gt_gaze)
gt_strings = []
subjects = []
for gt_scanpath in gt_scanpaths:
gt_string = scanpath2clusters(best_ms, gt_scanpath)
gt_strings.append(gt_string)
subjects.append(gt_scanpath['subject'])
return best_ms, gt_strings, subjects
Could it be that you changed something after you published the paper and the 0.490 is not the one you got with the current code provided? Or am I doing something wrong with the clusters? Thank you!
Is it because of the clusters you computed for sequence score? Can you verify it by using the provided clusters?
Where can I find them? I don't see any clusters.npy as mentioned here, even in older commits.
Where can I find them? I don't see any clusters.npy as mentioned here, even in older commits.
Please find it at https://drive.google.com/file/d/1_NDKSb2JbqbDkL3RHh24MOhrroBjkIyK/view?usp=sharing. Note that it also contains target-absent fixation clusters.
Thank you. I found two things: first, I got the score of .490 with the human oracle only if I don't skip the evaluation of a trajectory with itself, meaning with the same subject. Which shouldn't happen, I guess. Second, the clusters are really different than mine, but I don't understand why they look like this. E.g. for 'test-present-bottle-000000547875' I get a string for the subject 2 scanpath of [0, 3, 1, 1], while your string is [2, 13, 5, 0, 3, 3, 0, 0, 1], but the scanpath for 000000547875.jpg, subject 2 in the cocosearch18 test data is "X": [834.2,817.3,1181.0,1329.5 ], "Y": [ 531.0,180.6,160.8,264.4]. The duration list T however has 9 elements? How do you get then a string of len 9?
The only way I got your reported result of 0.490 was with the old test.json file, which you mean is corrupted, and when allowing to compare a scanpath against itself. With the new file and your clusters I get 0.527 when allowing to compare a scanpath against itself, which is wrong, and 0.476 if I don't. None of this is the reported score. Apart from that, the clusters I compute with the above script are different than the ones you provide. The MultiMatch score using the test.json now is only roughly the same as the one you reported: [0.92444455 0.7370559 0.89802225 0.921154 ].
Can you provide a evaluation script where we can reproduce your scores and be sure we are doing everything right when using new data? Thank you
The original json file is not corrupted and we used that for computing the human consistency. We removed the fixations after the viewer first fixated at the target for implementing a manual stopping criteria (i.e., stop searching when a fixation hitting the target). As for data release, we want to also include the fixations after hitting a target which might be interesting for other researchers.
I think you are doing the right thing. In the original paper, we included the cases of comparing against itself, which is wrong, leading to a human consistency of 0.490 in sequence score. Thank you so much for pointing that out!
Thank you, but still I cannot reproduce your clustering. E.g. for subject 2 for a specific image 000000547875.jpg the computed cluster is [3, 19, 7, 0, 10, 4, 0, 2, 1] and the give one is [2, 13, 5, 0, 3, 3, 0, 0, 1], apparently assigning different clusters there, where yours assigns the same.
Can you verify the script I posted above does what you did to get the clusters, including the list of factors [0.2, 0.5, 0.8, 1.0, 1.2, 1.5, 1.8]?
Can you verify the script I posted above does what you did to get the clusters, including the list of factors [0.2, 0.5, 0.8, 1.0, 1.2, 1.5, 1.8]?
Yes, but please note that a) we used the new .json file with fixations after fixating at the target to do the clustering; b) as you may see in the provided cluster.npy, target-absent fixations are also included when performing the clustering.
thank you. I also use the new .json, so the strings length is the same as in the pre-computed clusters. But still, there are some small differences in the obtained strings. In addition, you compute strings also for the subjects with "fixOnTarget": false and "correct": 0, which I excluded from evaluation. Also, why computing clusters for target-absent fixations should make any difference? The clusters are computed per image, so they are independent from each other, right?
Hello, I am interested in your work, and I want to replicate the reported result first before performing further experiments (for a class project). The metrics.py contains functions to compute sequence scores, but as mentioned in #3, some clustering work must be done first. I have read the Sequence Score algorithm, but I have no idea how to perform it. Can you provide some more materials on computing the metric? Thank you for reading!