BetsyHJ / KSR

31 stars 14 forks source link

About the Freebase Dataset #2

Open familyld opened 5 years ago

familyld commented 5 years ago

Hi, your job on SIGIR-18 is very fascinating and has important realistic significance. However, when I try to reproduce the experiments, I found that there are many uncertainties. For example, in your paper "Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks" , you declared:

we adopt the one-time Freebase [8] dump consisting of 63 million triples

In "KB4Rec: A Dataset for Linking Knowledge Bases with Recommender Systems", you declared:

we use the version of March 2015, which is its latest public version.

But what I found on the homepage of Freebase is:

snipaste_2018-09-14_12-45-13 snipaste_2018-09-14_12-27-24

Which is the actually data set you used for training the KSR algorithm? Or do you use the pre-trained embeddings provided by the thunlp group? Can you provide some further instructions on how to produce the entity embeddings or a method to download the embeddings you used in the experiments? This is an issue of significance to reproduce the experimental results. Any help will be greatly appreciated.

Thanks you very much.

Best regards.

BetsyHJ commented 5 years ago

Hi, Thank you for your attention. We use the latest dataset which can be found on the homepage of Freebase. And our KB embedding code/tool is based on projects of THUNLP(https://github.com/thunlp). OpenKE(https://github.com/thunlp/OpenKE) is their main project, you can find nearly all methods related here, including all TransX model used in the paper. Best regards.

familyld commented 5 years ago

Thank you for you reply. Unfortunately, the link (https://developers.google.com/freebase/data) you gave in the paper "KB4Rec" is abandoned now. snipaste_2018-09-14_16-31-55 What's more, I can't find the dataset you used on the homepage of Freebase. (https://developers.google.com/freebase/). Currently, there are three datasets available on the homepage but none of them is the dataset you used in the experiments.

BetsyHJ commented 5 years ago

I think you can try this: https://developers.google.com/freebase/#freebase-rdf-dumps. You can construct KG for Rec by using Freebase Triples and the linkage file.

familyld commented 5 years ago

This dataset (https://developers.google.com/freebase/#freebase-rdf-dumps) contains 1.9 billion triples in total which is inconsistent with what you said in the KSR paper:

we adopt the one-time Freebase [8] dump consisting of 63 million triples

The size of the uncompressed file is 250GB. It's too large.

familyld commented 5 years ago

Is there any other way to solve this problem???

familyld commented 5 years ago

Still no response. It's sad.

BetsyHJ commented 5 years ago

I am sorry for the late reply. We did use the big data with the size of 250GB. And got the subgraph following the https://github.com/RUCDM/KB4Rec. We will try to release the subgraph and fix the code later. If you have any question, you can contact the first author by mail.