RichardHGL / WSDM2021_NSM

Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.
132 stars 22 forks source link

The way to get triples #5

Closed novice7 closed 3 years ago

novice7 commented 3 years ago

您好! 请问下您在进行pagerank之前,是如何获得Freebase三元组的?

请问有方法获得(实体,关系, 值)的三元组吗, 目前是只有(实体,关系,实体)的三元组吗

RichardHGL commented 3 years ago

我们首先是获得了话题实体(问题提及实体)周围两跳的三元组然后去运行PageRank。 当前包含部分(实体,属性,值)这样的三元组,我们没有做特别区分

Question 1: How to get Freebase triples before Pagerank. Answer: Firstly, we filter neighborhood triples around topic entities and then we run Pagerank to reserve entities.

Question 2: Is there (entity, attribute, value) triples here? Answer: Yes.

novice7 commented 3 years ago

那想请问下freebase三元组数据您是从哪获得呢,因为现在freebase已经停止服务了

不好意思哈, entities.txt中确实包含值

RichardHGL commented 3 years ago

Full Freebase Dump can be downloaded from here https://developers.google.com/freebase/.

In this paper, we use Freebase dump downloaded from Microsoft You can use this command: wget https://download.microsoft.com/download/A/E/4/AE428B7A-9EF9-446C-85CF-D8ED0C9B1F26/FastRDFStore-data.zip --no-check-certificate. After downloading it, use fb_en.txt which contains triples in English. For subgraph extraction, you can also refer to https://github.com/OceanskySun/GraftNet. Later, I may give a more detailed process about preprocessing datasets in this repo. Maybe one or two weeks later.

novice7 commented 3 years ago

Do you know the faster way to get the real entity name through the entity MID ? Is there the only way to get the real entity name by getting Freebase/Wikidata Mappings first from https://developers.google.com/freebase/, and then searching the wiki address ?

RichardHGL commented 3 years ago

You can find the entity name through type.object.name attribute. For more detail about Freebase usage, I suggest that you can search it online.