DeepGraphLearning / graphvite

GraphVite: A General and High-performance Graph Embedding System
https://graphvite.io
Apache License 2.0
1.22k stars 151 forks source link

Knowledge Graph Application. Link prediction error #49

Closed chengjiali closed 4 years ago

chengjiali commented 4 years ago

Hi,

I was trying to train KG embedding and perform link prediction with a custom dataset. When I run

predictions = app.entity_prediction(file_name='valid.txt', target="tail", k=5)

I get error at this line:

assert len(new_R) == len(R), "Can't recognize some entities or relations"

I have been reading the code for a while but can't figure it out. Any idea what might cause this?

BTW, sometime I get loss=nan during training, after the 5th iteration, especially when using small dimensions. Don't know why.

Thank you for the help!

KiddoZhu commented 4 years ago

This indicates that some relations in the test set have never been seen in the training set, so the tail entities of these triplets can't be predicted. You may filter these triplets in your test set.

loss=nan is usually a result of large learning rate. For knowledge graph embedding, it is necessary to tune both lr and relation_lr_multiplier. See the config files for Wikidata5m to have an idea of these hyperparameters on large-scale datasets.

chengjiali commented 4 years ago

Hi,

Thanks for your reply.

I checked the data I used. All the relations in test set appears in the training set. I have only 2 relations, GoodFor and BadFor.

KiddoZhu commented 4 years ago

Maybe some entities in the test set don't occur in the training set?

chengjiali commented 4 years ago

I see. That may be the problem. I was expecting to have error when checking the length of the entities. Let me try what you suggested and see if it works.

Thank you very much!

chengjiali commented 4 years ago

The problem is some entities in test set does not appear in training set. It's solved now. Thank you!

ralgond commented 1 year ago

This indicates that some relations in the test set have never been seen in the training set, so the tail entities of these triplets can't be predicted. You may filter these triplets in your test set.

loss=nan is usually a result of large learning rate. For knowledge graph embedding, it is necessary to tune both lr and relation_lr_multiplier. See the config files for Wikidata5m to have an idea of these hyperparameters on large-scale datasets.

Maybe some entities in the test set don't occur in the training set?

英文不行,我还是用中文吧。我也是遇到了这个问题,你提到说是test中的一些实体没在train中出现过,所以会报这个错误,但是我看了看FB15k-237数据集,test也有几个(10348-10319=29)几个train不认识的实体,我很好奇这个数据集是否能通过测试。

另外,我想请教下application.name_map这个函数究竟是在解决什么问题?