ram-g-athreya commented 5 years ago

Description

Question answering over Linked Data can be broadly segmented into three tasks. Identifying named entities, identifying predicates (relation extraction) and finally generating a precise SPARQL query that can answer the question by using the identified entities and predicates.

Relation extraction is one of, if not the hardest step in this process and the dominant method is the usage of custom-built lexicons to match words in a query to a dictionary of phrases mapped to DBpedia predicates. Instead, we suggest the usage of word embeddings for solving the relation extraction task.

Goals

For this project we will be using the LC-QuAD dataset which contains 5000 questions derived from DBpedia along with their corresponding SPARQL query and generic query template. The research problem is as follows:

Given a question, its corresponding SPARQL template:

Identify the DBpedia entities (resources) for each triple in the question.
Using the identified resource apply word embeddings for each predicate label of said resource and find the closest match among the available words in the input question
Experiment with different similarity metrics for matching predicate labels to the input question.
Evaluate the overall performance of the system compared to existing methods using GERBIL.

Warm-up Tasks

Read the papers
Pytorch
1. Get familiar with Pytorch with some basic tutorials eg: How to use Pre-trained Word Embeddings in PyTorch
2. FastText
Familiarize yourself with SPARQL
1. Short intro video on SPARQL and querying DBpedia
2. SPARQL by example
Example of successful project proposal.

Impact

The project will allow users to access DBpedia knowledge using natural language.

Mentors

TBD (Ram G Athreya, Rricha Jalota and Ricardo Usbeck)

sinAshish commented 5 years ago

The project idea seems very interesting. I'll start by reading the LC-QUAD paper. 😃

DwaraknathT commented 5 years ago

Hey everyone, this is a wonderful idea that can be very useful. I have gone through the suggested papers and dataset and considering the not so easy task of reading papers, I would like to summarize the papers. May I know if I can submit my summaries or if yes, where?

ram-g-athreya commented 5 years ago

Hi @DwaraknathT

Great that you are interested in the project! Reading the papers would be a great help in understanding the problem domain and could help in generating new ideas.

One suggestion would be to include your summary as part of your proposal and how you might leverage existing research in solving this problem.

Hope this helps.

Thanks Ram G Athreya

g-laz77 commented 5 years ago

Hi, I have gone throught the LC-QuaD and Enriching word vectors papers. I have also previously worked on a couple of deep learning projects in pytorch and have used fasttext embeddings as well. It would be a great opportunity to work on this project. What should I do next in order to start contribution to this project?

sinAshish commented 5 years ago

Thanks for the suggestion on proposal writing @ram-g-athreya

ram-g-athreya commented 5 years ago

Hi @g-laz77

Great that you are getting familiar with the problem. Since our project relies on the semantic web it might be a good idea to familiarize yourself with SPARQL and querying DBpedia in general. You can also start working on your proposal and thinking about how you would solve the problem, especially how you would match the words in the question with the predicates in the knowledge graph, basically what algorithms you might use.

You can find the DBpedia query interface here: http://dbpedia.org/sparql

A basic video on using SPARQL with DBpedia can be found here. Its a little old but still relevant information: https://www.youtube.com/watch?v=BmHKb0kLGtA

SPARQL by example: https://www.w3.org/2009/Talks/0615-qbe/

Hope this helps.

Thanks Ram G Athreya

rishabhjoshi commented 5 years ago

Hi Your proposal seems really interesting! I have extensively worked on Distantly Supervised Relation Extraction (given the entities) and have proposed a neural model that is based on Side Information and GCN. The work was done under Prof. Partha Talukdar in the Indian Institute of Science, Bangalore. It was accepted in EMNLP 2018. Some people in his lab are currently working on KB-Question answering as well. The link to the paper is : here The link to the code is : here Do check it out. I believe Relation Extraction in your proposal can benefit from this. Thanks Rishabh

g-laz77 commented 5 years ago

@ram-g-athreya I have gone through the youtube video on SPARQL and the exercises on the SPARQL by examples page. I am now familiar with SPARQL. I also read the RESIDE paper suggested by @rishabhjoshi . The approach is good for Relation Extraction(predicate detection), given the entities. It makes use of Syntactic side information acquisition for searching in a common embedding space. The right relation from the extended set of relation aliases obtained from the KB. This reduces the task to identifying the entities in the question and finding the right predicate by using RESIDE.

GingerEater commented 5 years ago

Hi, I'm in the process of writing the proposal. I have a question: is there some existing module in DBPedia that provide SPARQL query generating service can be used in this program? Or this part also needs to be done by myself? Thanks and looking forward to your reply!

ram-g-athreya commented 5 years ago

Hi @GingerEater

You would have to generate the SPARQL queries yourself. But it would be based on the templates in LC-QuAD.

Hope it helps.

dbpedia / GSoC

Predicate Detection using Word Embeddings for Question Answering over Linked Data #27

Description

Goals

Warm-up Tasks

Impact

Mentors