Closed xanhho closed 7 years ago
@xanhxanh94 I see. Sorry for misunderstood your problem. You want the table of scores in the curve... I actually don't have the table. But I ran it once again for you. Fig 4: citeulike-a M: 50, 100, 150, 200, 250, 300 0.1486 0.2019 0.2420 0.2744 0.3011 0.3236 citeulike-t 0.1962 0.2331 0.2586 0.2799 0.2983 0.3128
Fig 5. citeulike-a 0.4268 0.5259 0.5779 0.6123 0.6391 0.6615 citeulike-t 0.4327 0.5354 0.5850 0.6156 0.6370 0.6529
Thank you very much! Yes, I want the table of scores in the curve to compare the scores. Opps! How can you draw figures without the scores :) The scores look the same with the scores in the figures, but in dataset citeulike-a, Fig 4 Fig 4: citeulike-a M: 50, 100, 150, 200, 250, 300 0.1070 0.1628 0.2049 0.2389 0.2670 0.2906 It looks different with the scores report in the paper.
See the updated number. I saved all the final models, and draw figures using those models directly.
Hi, Thank you very much! I try to do something to reproduce the results, but the results are not expected. Can you please describe the process to get the results after finished run file test_cvae.py?
I actually posted the evaluation code, but deleted after noting that you were requesting table score. I myself tried to reproduce results from baseline method, and found out the following code reproduces baseline results most similarly. Thus it is used to produce results of my paper. (I actually don't think it's good to include the training rate, but the relative performance among different methods is more important.)
function [recall] = evaluate(train_users, test_users, m_U, m_V, M)
m_num_users = size(m_U,1);
m_num_items = size(m_V,1);
batch_size = 100;
n = ceil(1.0*m_num_users/batch_size);
num_hit = zeros(m_num_users,M);
num_total = zeros(m_num_users,1);
for i=1:n
ind = (i-1)*batch_size+1:min(i*batch_size, m_num_users);
u_tmp = m_U(ind,:);
score = u_tmp * m_V';
[~,I] = sort(score, 2, 'descend');
bs = length(ind);
gt = zeros(bs, m_num_items);
for j=1:bs
idx = (i-1)*batch_size + j;
u = train_users{idx};
gt(j, u(2:end)) = 1;
end
for j=1:bs
idx = (i-1)*batch_size + j;
u = test_users{idx};
gt(j, u(2:end)) = 1;
end
re = zeros(bs, m_num_items);
for j=1:bs
re(j,:) = gt(j, I(j,:));
end
num_hit(ind, :) = re(:, 1:M);
num_total(ind, :) = sum(re, 2);
end
recall = mean(cumsum(num_hit, 2)./repmat(num_total, 1, M), 1);
Hi, I remember you had posted the code by Python, Can you please post the code by Python. It makes me confused when I try to change some codes from Python to Octave.
Thank you very much!
You might need to revise the code somehow.
def predict(self, train_users, test_users, M):
batch_size = 100
n = int(math.ceil(1.0*self.m_num_users/batch_size))
num_hit = np.zeros(self.m_num_items)
recall = np.zeros(self.m_num_users)
for i in xrange(n):
u_tmp = self.m_U[i*batch_size:min((i+1)*batch_size, self.m_num_users)]
score = np.dot(u_tmp, self.m_V.T)
ind_rec = np.argsort(score, axis=1)[:,::-1]
# construct ground truth
bs = min((i+1)*batch_size, self.m_num_users) - i*batch_size
gt = np.zeros((bs, self.m_num_items))
for j in range(bs):
ind = i*batch_size + j
gt[j,train_users[ind]] = 1
for j in range(bs):
ind = i*batch_size + j
gt[j,test_users[ind]] = 1
# sort gt according to ind_rec
rows = np.array(range(bs))[:, np.newaxis]
gt = gt[rows, ind_rec]
recall[i*batch_size:min((i+1)*batch_size, self.m_num_users)] = 1.0*np.sum(gt[:, :M], axis=1)/np.sum(gt, axis=1)
num_hit += np.sum(gt, axis=0)
recall = np.mean(recall)
return recall
Hi, Can you explain for me this codes: for j in range(bs): ind = ibatch_size + j gt[j,train_users[ind]] = 1 for j in range(bs): ind = ibatch_size + j gt[j,test_users[ind]] = 1 In my view, gt matrix consider as ground truth to evaluate the results. But here you also use file traning set (train_users) in the matrix gt. Can you explain for me why?
If I comment the code like this:
# ind = i*batch_size + j
# gt[j,train_users[ind]] = 1
for j in range(bs):
ind = i*batch_size + j
gt[j,test_users[ind]] = 1
The results is worse, but we cannot use the training set when we evaluate the results.
Thank you very much for your support!
As I stated in previous post, I myself tried to reproduce the results of baseline methods. I should have eliminated training ratings from groundtruth, but with that I cannot reproduce the results of baseline methods. The code I posted reproduces most similar results, which makes me guess that they used such evaluation method. That's why I used it. But I'm also not very comfortable with it. However, as I said, the relative performance of different methods is more important and as I see the relative results remain the same no matter what evaluation function is used. If you are going to use it, then you will have to judge what evalution function is more appropriate for your case. Hope it helps.
Hi, Have you confirmed with the authors (of the paper CTR, CDL) about how they evaluate the results?
Unfortunately, they didn't provide the code for evaluation in their released code. I also didn't ask for it.
Hi,
Can you supply the table scores in Figure 4 and 5 in your paper? I try to reproduce the results, but I can't get the specific scores for the figure.
Thank you very much!