eelxpeng / CollaborativeVAE

MIT License
75 stars 34 forks source link

The recall scores in Figure 4 and 5 in the paper #2

Closed xanhho closed 7 years ago

xanhho commented 7 years ago

Hi,

Can you supply the table scores in Figure 4 and 5 in your paper? I try to reproduce the results, but I can't get the specific scores for the figure.

Thank you very much!

eelxpeng commented 7 years ago

@xanhxanh94 I see. Sorry for misunderstood your problem. You want the table of scores in the curve... I actually don't have the table. But I ran it once again for you. Fig 4: citeulike-a M: 50, 100, 150, 200, 250, 300 0.1486 0.2019 0.2420 0.2744 0.3011 0.3236 citeulike-t 0.1962 0.2331 0.2586 0.2799 0.2983 0.3128

Fig 5. citeulike-a 0.4268 0.5259 0.5779 0.6123 0.6391 0.6615 citeulike-t 0.4327 0.5354 0.5850 0.6156 0.6370 0.6529

xanhho commented 7 years ago

Thank you very much! Yes, I want the table of scores in the curve to compare the scores. Opps! How can you draw figures without the scores :) The scores look the same with the scores in the figures, but in dataset citeulike-a, Fig 4 Fig 4: citeulike-a M: 50, 100, 150, 200, 250, 300 0.1070 0.1628 0.2049 0.2389 0.2670 0.2906 It looks different with the scores report in the paper.

eelxpeng commented 7 years ago

See the updated number. I saved all the final models, and draw figures using those models directly.

xanhho commented 7 years ago

Hi, Thank you very much! I try to do something to reproduce the results, but the results are not expected. Can you please describe the process to get the results after finished run file test_cvae.py?

eelxpeng commented 7 years ago

I actually posted the evaluation code, but deleted after noting that you were requesting table score. I myself tried to reproduce results from baseline method, and found out the following code reproduces baseline results most similarly. Thus it is used to produce results of my paper. (I actually don't think it's good to include the training rate, but the relative performance among different methods is more important.)

function [recall] = evaluate(train_users, test_users, m_U, m_V, M)
m_num_users = size(m_U,1);
m_num_items = size(m_V,1);

batch_size = 100;
n = ceil(1.0*m_num_users/batch_size);
num_hit = zeros(m_num_users,M);
num_total = zeros(m_num_users,1);
for i=1:n
   ind = (i-1)*batch_size+1:min(i*batch_size, m_num_users);
   u_tmp = m_U(ind,:);
   score = u_tmp * m_V';
   [~,I] = sort(score, 2, 'descend');

   bs = length(ind);
   gt = zeros(bs, m_num_items);
   for j=1:bs
       idx = (i-1)*batch_size + j;
       u = train_users{idx};
       gt(j, u(2:end)) = 1;
   end
   for j=1:bs
       idx = (i-1)*batch_size + j;
       u = test_users{idx};
       gt(j, u(2:end)) = 1;
   end
   re = zeros(bs, m_num_items);
   for j=1:bs
       re(j,:) = gt(j, I(j,:));
   end

   num_hit(ind, :) = re(:, 1:M);
   num_total(ind, :) = sum(re, 2);
end

recall = mean(cumsum(num_hit, 2)./repmat(num_total, 1, M), 1);
xanhho commented 7 years ago

Hi, I remember you had posted the code by Python, Can you please post the code by Python. It makes me confused when I try to change some codes from Python to Octave.

Thank you very much!

eelxpeng commented 7 years ago

You might need to revise the code somehow.

def predict(self, train_users, test_users, M):
        batch_size = 100
        n = int(math.ceil(1.0*self.m_num_users/batch_size))
        num_hit = np.zeros(self.m_num_items)
        recall = np.zeros(self.m_num_users)
        for i in xrange(n):
            u_tmp = self.m_U[i*batch_size:min((i+1)*batch_size, self.m_num_users)]
            score = np.dot(u_tmp, self.m_V.T)
            ind_rec = np.argsort(score, axis=1)[:,::-1]

            # construct ground truth
            bs = min((i+1)*batch_size, self.m_num_users) - i*batch_size
            gt = np.zeros((bs, self.m_num_items))
            for j in range(bs):
                ind = i*batch_size + j
                gt[j,train_users[ind]] = 1
            for j in range(bs):
                ind = i*batch_size + j
                gt[j,test_users[ind]] = 1
            # sort gt according to ind_rec
            rows = np.array(range(bs))[:, np.newaxis]
            gt = gt[rows, ind_rec]

            recall[i*batch_size:min((i+1)*batch_size, self.m_num_users)] = 1.0*np.sum(gt[:, :M], axis=1)/np.sum(gt, axis=1)
            num_hit += np.sum(gt, axis=0)

        recall = np.mean(recall)
        return recall
xanhho commented 7 years ago

Hi, Can you explain for me this codes: for j in range(bs): ind = ibatch_size + j gt[j,train_users[ind]] = 1 for j in range(bs): ind = ibatch_size + j gt[j,test_users[ind]] = 1 In my view, gt matrix consider as ground truth to evaluate the results. But here you also use file traning set (train_users) in the matrix gt. Can you explain for me why?

If I comment the code like this:

for j in range(bs):

    #     ind = i*batch_size + j
    #     gt[j,train_users[ind]] = 1
    for j in range(bs):
        ind = i*batch_size + j
        gt[j,test_users[ind]] = 1

The results is worse, but we cannot use the training set when we evaluate the results.

Thank you very much for your support!

eelxpeng commented 7 years ago

As I stated in previous post, I myself tried to reproduce the results of baseline methods. I should have eliminated training ratings from groundtruth, but with that I cannot reproduce the results of baseline methods. The code I posted reproduces most similar results, which makes me guess that they used such evaluation method. That's why I used it. But I'm also not very comfortable with it. However, as I said, the relative performance of different methods is more important and as I see the relative results remain the same no matter what evaluation function is used. If you are going to use it, then you will have to judge what evalution function is more appropriate for your case. Hope it helps.

xanhho commented 7 years ago

Hi, Have you confirmed with the authors (of the paper CTR, CDL) about how they evaluate the results?

eelxpeng commented 7 years ago

Unfortunately, they didn't provide the code for evaluation in their released code. I also didn't ask for it.