Code for generating candidate entities during evaluation for WikiKG

chocolate9624 commented 2 years ago

Hi, I noticed the evaluation data is loaded in this line https://github.com/google-research/smore/blob/5b1a8a00b0cbfa024f411fc080b3d46dc681edd8/smore/training/main_train.py#L212 for WikiKG.

However, I cannot find the code for generating candidate tail entities during evaluation. Can you show more details?

Thanks a lot!

hyren commented 2 years ago

Hi, I have updated the preprocess code to download the candidate set. Please check here.

LeoYML commented 1 year ago

Could you give more details about how to generate the candidates of valid data?

valid_url = "https://snap.stanford.edu/smore/valid.pt" I see the comments in code: "# Specifically designed for OGB-LSC WikiKG v2. Since no candidates are provided by the original dataset, we generate candidates based on heuristics such as degrees / entity types." But I can't reproduce this, and the candidates of per relation is difference(about < 1%).

In my opinion, the logistic is train_data.groupby('relation')['tail'].apply(lambda grp: list(grp.value_counts().nlargest(20000).index)) right?

google-research / smore

Code for generating candidate entities during evaluation for WikiKG #5