Open hv-abacus opened 1 year ago
Hello @hv-abacus , It seems that you are not filtering the data. The dataset statistics that you report here were obtained after 10-core filtering, which were specified by parameters 'user_inter_num_interval' and 'item_inter_num_interval' in the yaml file. You can use our yaml file to run code directly on Amazon datasets and you can obtain the same statistics.
Hi, I downloaded the
Amazon
dataset from here: https://recbole.s3-accelerate.amazonaws.com/CrossDomain/Amazon.zipThe dataset statistics that you report here do not match with what I compute from the original data.
I removed all rows with
NaN
s and compute the number of unique values present in theuser_id
column in the original.inter
files. This gives the following statistics:Am I doing something wrong?