casperhansen / NeuHash-CF

Content-aware Neural Hashing for Cold-start Recommendation. SIGIR 2020
19 stars 6 forks source link

What the xxx_samples is? #2

Open ghost opened 3 years ago

ghost commented 3 years ago

Thanks for the wonderful work. I downloaded the code and data of your paper through the link you gave me. But what the train_samples should be?

such as the code as follows : elif args["dataset"].lower() == "amacold": # 50p train_samples = 831866 val_samples = 148062 test_samples = 73857 max_rating = 5.0

how could i set these parameters? And if I set a wrong parameters, there comes a problem:

Caused by op 'add_3', defined at: File "main.py", line 536, in main() File "main.py", line 414, in main is_training, args, max_rating, anneal_val, anneal_val_vae, batch_placeholder) File "/mnt/data0/home/rocket_diggers_2/tuijian/acm_mm/NeuHash-CF/code/model.py", line 270, in make_network total_loss, ham_dist_i1, reconloss = make_total_loss(i1_org_m, i1r, i1_sampling, sigma_anneal) File "/mnt/data0/home/rocket_diggers_2/tuijian/acm_mm/NeuHash-CF/code/model.py", line 221, in make_total_loss i1r = i1r + e0anneal File "/mnt/data0/home/rocket_diggers_1/anaconda3/envs/tuijian/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper return func(x, y, name=name) File "/mnt/data0/home/rocket_diggers_1/anaconda3/envs/tuijian/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 301, in add "Add", x=x, y=y, name=name) File "/mnt/data0/home/rocket_diggers_1/anaconda3/envs/tuijian/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/mnt/data0/home/rocket_diggers_1/anaconda3/envs/tuijian/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(args, **kwargs) File "/mnt/data0/home/rocket_diggers_1/anaconda3/envs/tuijian/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/mnt/data0/home/rocket_diggers_1/anaconda3/envs/tuijian/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [1000] vs. [82000] [[node add_3 (defined at /mnt/data0/home/rocket_diggers_2/tuijian/acm_mm/NeuHash-CF/code/model.py:221) = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](IteratorGetNext/_103, mul_9)]] [[{{node add_8/_113}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_222_add_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

casperhansen commented 3 years ago

Hi,

Thanks for your question. These are simply hardcoded lengths of the training, validation, and testing sets, i.e., how many samples each of those has (note that the test number is slightly different from the train/val, as it only contains each user and item once, as nothing more is needed for computing the necessary hash codes for evaluation). If you decide to use new datasets, or change the existing ones, these should be updated: One way to get the counts is as follows (train,val,test lengths):

print(sum(1 for _ in tf.python_io.tf_record_iterator(trainfiles[0])))
print(sum(1 for _ in tf.python_io.tf_record_iterator(valfiles[0])))
print(sum(1 for _ in tf.python_io.tf_record_iterator(testfiles[0])))

I hope this answers your question.

ghost commented 3 years ago

Thank you for your reply. It's very helpful for me, and the code could work effectively. But I have another question about the preloaded_testsamples = pickle.load(open(args["dataset"] + "_testdata.pkl","rb")) . I can't find code to generate this test file. I noticed the testdata.pkl is like this:

preloaded_testsamples[user]:
[[2782, 4],
  [3836, 5],
  [7798, 5],
  [8689, 5],
  [9133, 5],
  [11613, 5]]

so i guess it's composed of preloaded_testsamples[user] = [item,ratings],and i generate preloaded_testsamples on whole dataset like this:

datamatlab = loadmat('../ratings_contentaware_full.mat')
full_matrix = datamatlab["full_matrix"]
full_matrix = full_matrix.todense()
finnal_test_data = []
for i in tqdm(range(full_matrix.shape[0])):
    preload_test_data=[]
    for j in range(full_matrix[i].shape[1]):
        if (full_matrix[i,j] != 0):
            pretest = [j,full_matrix[i,j]]
            preload_test_data.append(pretest)
    finnal_test_data.append(preload_test_data)
pickle.dump(finnal_test_data, open('./test_data.pkl', "wb"))

Is this right? Or is there another to get this file? Thanks!

casperhansen commented 3 years ago

Thanks for reaching out. That seems to be one way to do it yes, but alternatively you can use the provided .tfrecord files (or extract another format based on those).