RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.44k stars 614 forks source link

嘗試建立yelp知識圖譜 #1597

Open emilyjeng opened 1 year ago

emilyjeng commented 1 year ago

您好!我嘗試簡單建立了yelp的知識圖譜,在.kg檔案中,我將head_id:token設為iten_id:token,relation_id:token設為location.shop.location,tail_id:token設為categories:token_seq 如下所示:

image

也增加另一個relation_id:token

image

在.link檔案中,item_id:token保持不變,entity_id:token設為categories:token_seq,如下所示:

image

但在我執行時會遇到錯誤,如下所示:

Traceback (most recent call last): File "run_recbole.py", line 48, in run_recbole( File "/Emily/RecBole-master/recbole/quick_start/quick_start.py", line 69, in run_recbole dataset = create_dataset(config) File "/Emily/RecBole-master/recbole/data/utils.py", line 70, in create_dataset dataset = dataset_class(config) File "/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 68, in init super().init(config) File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 108, in init self._from_scratch() File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 120, in _from_scratch self._data_processing() File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 168, in _data_processing self._normalize() File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 710, in _normalize feat[field] = norm(feat[field].values) File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 698, in norm mx, mn = max(arr), min(arr) ValueError: max() arg is an empty sequence

我不理解該如何處理這問題?或是我在建立知識圖譜的想法有錯? 如需復現我可以提供資料

Ethan-TZ commented 1 year ago

@emilyjeng 您好,請問是否可以提供一下您運行的配置文件?

emilyjeng commented 1 year ago

檔案連結如下: https://drive.google.com/drive/folders/10-2Q8zW3FC_hylvKXorfohL8w0sw1Dm-?usp=sharing

yaml檔設定如下:

dataset config

field_separator: "\t" seq_separator: " " USER_ID_FIELD: user_id ITEM_ID_FIELD: item_id RATING_FIELD: rating TIME_FIELD: timestamp NEGPREFIX: neg LABEL_FIELD: label normalize_all: True #正規化 threshold: rating: 4 load_col: inter: [user_id, item_id, rating] kg: [head_id, relation_id, tail_id] link: [item_id, entity_id]

data filtering for interactions

val_interval: rating: "[4,inf)"
unused_col: inter: [rating]

user_inter_num_interval: "[10,inf)" item_inter_num_interval: "[10,inf)"

embedding_size: 64 kg_embedding_size: 64 # (int) The embedding size of relations in knowledge graph. reg_weights: [1e-2,1e-2] # (list of float) The L2 regularization weights.

data preprocessing for knowledge graph triples

kg_reverse_r: True entity_kg_num_interval: "[5,inf)" relation_kg_num_interval: "[5,inf)"

training and evaluation

epochs: 500 train_batch_size: 4096 eval_batch_size: 40960000 metrics: ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision'] valid_metric: Hit@10 train_neg_sample_args: distribution: uniform sample_num: 1 dynamic: False

執行: python run_recbole.py --model=CKE --dataset=yelp22_us10shop --config_files=test.yaml

Ethan-TZ commented 1 year ago

@emilyjeng 您好,請嘗試將normalize_all設置爲False

emilyjeng commented 1 year ago

@chenyuwuxin 感謝解答!我將normalize_all設置爲False後,出現報錯如下:

02 Jan 08:41 INFO yelp22_us10shop The number of users: 1 Average actions of users: nan The number of items: 1 Average actions of items: nan The number of inters: 0 The sparsity of the dataset: 100.0% Remain Fields: ['entity_id', 'user_id', 'item_id', 'head_id', 'relation_id', 'tail_id', 'label'] The number of entities: 1 The number of relations: 2 The number of triples: 0 The number of items that have been linked to KG: 0 02 Jan 08:41 WARNING Field [rating] is not in [inter_feat], which can not be set in unused_col. Traceback (most recent call last): File "run_recbole.py", line 48, in run_recbole( File "/Emily/RecBole-master/recbole/quick_start/quick_start.py", line 73, in run_recbole train_data, valid_data, test_data = data_preparation(config, dataset) File "/Emily/RecBole-master/recbole/data/utils.py", line 166, in data_preparation train_sampler, valid_sampler, test_sampler = create_samplers( File "/Emily/RecBole-master/recbole/data/utils.py", line 297, in create_samplers sampler = Sampler( File "/Emily/RecBole-master/recbole/sampler/sampler.py", line 227, in init super().init(distribution=distribution, alpha=alpha) File "/Emily/RecBole-master/recbole/sampler/sampler.py", line 40, in init self.used_ids = self.get_used_ids() File "/Emily/RecBole-master/recbole/sampler/sampler.py", line 257, in get_used_ids raise ValueError( ValueError: Some users have interacted with all items, which we can not sample negative items for them. Please set user_inter_num_interval to filter those users.,

但如上面的設置,我有設置user_inter_num_interval,以及我發現user及item數量過少,請問是否我的數據集kg及link建立關聯的想法是否錯誤的?或是有其他的問題? 如下圖:

image
Ethan-TZ commented 1 year ago

@emilyjeng 您好,這個問題是由於數據集中存在某個用戶或者物品的交互過少,導致它交互的對象全被過濾掉了。您可以嘗試降低user_inter_num_intervalitem_inter_num_interval來解決這個問題。

emilyjeng commented 1 year ago

@chenyuwuxin 您好!我後來發現是entity_kg_num_interval和relation_kg_num_interval的數量問題,當我降低後,產生了另一個錯誤 Traceback (most recent call last): File "run_recbole.py", line 48, in run_recbole( File "/Emily/RecBole-master/recbole/quick_start/quick_start.py", line 69, in run_recbole dataset = create_dataset(config) File "/Emily/RecBole-master/recbole/data/utils.py", line 70, in create_dataset dataset = dataset_class(config) File "/workspace-nfs/JMD220-dev/NLP/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 68, in init super().init(config) File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 108, in init self._from_scratch() File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 120, in _from_scratch self._data_processing() File "/Emily/RecBole-master/recbole/data/dataset/dataset.py", line 164, in _data_processing self._remap_ID_all() File "/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 407, in _remap_ID_all self._merge_item_and_entity() File "/Emily/RecBole-master/recbole/data/dataset/kg_dataset.py", line 349, in _merge_item_and_entity entity_id_map[i] = new_item_token2id[self.entity2item[entity_token[i]]] KeyError: '_7bSxlQbj51wn5_0DouyKg' 此錯誤看起來是找不到ID

是否 entity_id不能是字串呢? 以下是我的link檔,item_id:token entity_id:token

image