IDEA-FinAI / ToG

This is the official github repo of Think-on-Graph. If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email (xuchengjin@idea.edu.cn)
238 stars 26 forks source link

Question about finding and linking topic entities to KG #7

Closed YaooXu closed 6 months ago

YaooXu commented 6 months ago

Thanks for your great work! But I still have some questions about finding and linking topic entites after reading related issues.

  1. In issue 2 and the origin paper, you mentioned that you prompt LLMs to automatically extract the topic entities in question. However, for question "Where did the \"Country Nation World Tour\" concert artist go to college?" (the second sample in cwq.json), how the entity "Bachelor's degree" is extracted? It even doestn't appear in the question. The sample is as follows:

    {
      "ID": "WebQTrn-1259_1997cb4922db71983be26e6a509950f4",
      "compositionality_type": "composition",
      "created": "2018-02-12T21:58:58",
      "machine_question": "where did the artist had a concert tour named Country Nation World Tour graduate from college",
      "question": "Where did the \"Country Nation World Tour\" concert artist go to college?",
      "sparql": "PREFIX ns: <http://rdf.freebase.com/ns/>\nSELECT DISTINCT ?x\nWHERE {\nFILTER (?x != ?c)\nFILTER (!isLiteral(?x) OR lang(?x) = '' OR langMatches(lang(?x), 'en'))\n?c ns:music.artist.concert_tours ns:m.010qhfmm . \n?c ns:people.person.education ?y .\n?y ns:education.education.institution ?x .\n?x ns:common.topic.notable_types ns:m.01y2hnl .\n?y ns:education.education.degree ns:m.019v9k .\n}\n",
      "webqsp_ID": "WebQTrn-1259",
      "webqsp_question": "where did brad paisley graduate from college",
      "topic_entity": {
          "m.010qhfmm": "Country Nation World Tour",
          "m.019v9k": "Bachelor's degree"
      },
      "answer": "Belmont University",
      "qid_topic_entity": {
          "Q17004176": "Country Nation World Tour",
          "Q163727": "Bachelor's degree"
      }
    }
  2. Is the id of entity in Freebase (e.g., m.010qhfmm) is obtained by the property "Freebase ID (P646)" in Wikidata?

GasolSun36 commented 6 months ago

(1) Some datasets originally have topic entities, so we took an intersection after the prompt. (2) Yes. For details, you can view the API under wikidata. We have defined the function of qid2mid.

YaooXu commented 6 months ago

Thanks for your reply!

But In the original Complex Web Question test dataset, it only provides sparql of the query, the previos sample in the original test dataset is as follows:

    {
        "ID": "WebQTrn-1259_1997cb4922db71983be26e6a509950f4", 
        "compositionality_type": "composition", 
        "created": "2018-02-12T21:58:58", 
        "machine_question": "where did the artist had a concert tour named Country Nation World Tour graduate from college", 
        "question": "Where did the \"Country Nation World Tour\" concert artist go to college?", 
        "sparql": "PREFIX ns: <http://rdf.freebase.com/ns/>\nSELECT DISTINCT ?x\nWHERE {\nFILTER (?x != ?c)\nFILTER (!isLiteral(?x) OR lang(?x) = '' OR langMatches(lang(?x), 'en'))\n?c ns:music.artist.concert_tours ns:m.010qhfmm . \n?c ns:people.person.education ?y .\n?y ns:education.education.institution ?x .\n?x ns:common.topic.notable_types ns:m.01y2hnl .\n?y ns:education.education.degree ns:m.019v9k .\n}\n", 
        "webqsp_ID": "WebQTrn-1259", 
        "webqsp_question": "where did brad paisley graduate from college"
    }, 

So what you mean is that you use the the intersection of entities that appear in the sparql and entities extracted by LLMs as your topic entities?

GasolSun36 commented 6 months ago

Yes, we extract the entity from sparql, then intersect with the entity we extracted by prompting the LLM.

YaooXu commented 6 months ago

Thanks, I get it.