RUCAIBox / RecSysDatasets

This is a repository of public data sources for Recommender Systems (RS).
https://recbole.io/
844 stars 132 forks source link

Some questions related to the knowledge graph dataset #90

Closed hzau96yhz closed 2 years ago

hzau96yhz commented 3 years ago

1、Can neighbor jump points be understood in the following way: hop1: item->Entity hop2: item->Entity1 Entity1->Entity2 hop3: item->A A->B B->C If so, why is the total number of hop3.kg files in the movielens-kg folder less than hop2? 2、Can different versions of the Movielens dataset be used to generate.kg and.link files from the files in the Movielens-kg folder? Looking forward to your reply!

ShanleiMu commented 3 years ago

1、Can neighbor jump points be understood in the following way: hop1: item->Entity hop2: item->Entity1 Entity1->Entity2 hop3: item->A A->B B->C If so, why is the total number of hop3.kg files in the movielens-kg folder less than hop2? 2、Can different versions of the Movielens dataset be used to generate.kg and.link files from the files in the Movielens-kg folder? Looking forward to your reply!

  1. Yes, If the knowledge data like Item -> Entitiy1 -> Entity2 -> Entity3, the hop1.kg file only contains item -> Entity1, hop2.kg file contains Entity1 -> Entity2, hop3.kg contains Entity2 -> Entity3. If you want to get the 3 hop kg data, you should combine these files.
  2. ml-100k, ml-1m, ml-10m, ml-20m can use the provided scripts and raw data to get the corresponding knowledge data.
hzau96yhz commented 3 years ago

@ShanleiMu Sorry to bother you again,I want to ask you two questions

  1. May I ask what extra.kg is provided in the folder LFM-1B-KG?The data in this file is like a knowledge graph that has already been constructed. And what's the difference between relation.kg and relation_Full.kg?
  2. Because the lfm1b-traces.inter file is too large and computational resources are limited, I wonder if I can build the KG data set of lastFM. Can I use the LFM-1B-KG folder and the .inter file in the lastFM dataset to build the.kg and .link files for lastFM? Although LFM1B comes from LastFM, it is not known whether the files in the LFM-1B-KG folder are applicable to the LastFM dataset. Or I should refer to KB4rec to deal with it. Do you have any good suggestions?
ShanleiMu commented 3 years ago

@ShanleiMu Sorry to bother you again,I want to ask you two questions

  1. May I ask what extra.kg is provided in the folder LFM-1B-KG?The data in this file is like a knowledge graph that has already been constructed. And what's the difference between relation.kg and relation_Full.kg?
  2. Because the lfm1b-traces.inter file is too large and computational resources are limited, I wonder if I can build the KG data set of lastFM. Can I use the LFM-1B-KG folder and the .inter file in the lastFM dataset to build the.kg and .link files for lastFM? Although LFM1B comes from LastFM, it is not known whether the files in the LFM-1B-KG folder are applicable to the LastFM dataset. Or I should refer to KB4rec to deal with it. Do you have any good suggestions?
  1. In the original freebase dataset, there are some meaningless triples. For example, there are two triples music1, r1, place1 and place1, r2, artist1. The entity place1 has no clear meaning. We should combine them to generate a new triple music1, r3, artist1 which records the knowledge that the artist of music1 is artist1. We pre-process this kind of triples into extra.kg. So the triples in extra.kg are not directly recorded in the freebase dataset. You can view the triples in extra.kg as the 1-hop knowledge. relation_full.kg contains all the relations we process. The relations in relation.kg is the subset of the relations inrelation_full.kg, which are used for generating the final knowledge data for the recommendation data. More details can be found in conversion_tools.

  2. LFM1B and LastFM are two different datasets. Although they are both from last.fm, they are collected by different people and time. So you can not directly apply LFM-1B-KG on LastFM. As far as I know, KB4Rec records the linkage between musics in LFM1B and entities in freebase. It also can be used to generate knowledge data for LastFM. If you want to use LastFM dataset, you can refer to this paper KGNNLS. They generate the kg data for LastFM by Satori.