dbpedia / GSoC

Google Summer of Code organization
37 stars 27 forks source link

A Neural QA Model for DBpedia (GSoC 2019) #19

Closed mommi84 closed 4 years ago

mommi84 commented 5 years ago

Previous projects

This project idea is a follow-up of GSoC 2018 project A Neural QA Model for DBpedia.

Description

In the last years, the Linked Data Cloud has grown to over 100 billion facts pertaining to a multitude of domains. The DBpedia knowledge base consists of 4.58 million things on its own. However, accessing this information is challenging for lay users as they are not able to use SPARQL as querying language without exhaustive training.

Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve the state-of-the-art results at translating sequences into sequences. In this direction, we suggest a GSoC topic around Neural Networks to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary work on Question Answering with Neural SPARQL Machines (NSpM) shows promising results but the coverage is restricted to manually-curated templates.

The most up-to-date source code can be found here. During the GSoC, we will use this repository as workspace.

Goals

In this GSoC project, the candidate can choose between the following research directions:

  1. employ a language model (e.g., Question Generation, Universal Sentence Encoders) to automatically discover query templates;
  2. perform experiments on compositionality for complex QA;

with the following ultimate goals:

Impact

The project will allow users to access DBpedia knowledge using natural language.

Warm-up tasks

Mentors

Rricha Jalota and Nausheen Fatma (backup: Aashay Singhal, Aman Mehta, Tommaso Soru).

Keywords

structured question answering, deep learning, neural networks, sparql, tensorflow, python

wannabeOG commented 5 years ago

Great idea, I have opened up some issues which I found while running the initial tests. Would like to work on this project.

mommi84 commented 5 years ago

Hi @wannabeOG! Thanks for your interest. Please open a pull request, if you think you can fix those issues. Have you thought which research direction you would like to explore?

yudhik11 commented 5 years ago

Hi I have gone through both the papers as mentioned above which gave me good insight. Also, after going through the blog, I was successfully able to reproduce the experiment in which I kept the param --num_train_steps = 12000 where I achieved dev bleu 88.1 and test bleu 87.3 .

Currently, I am going through the code more extensively, but it would be helpful if I can get some suggestions in respect with proceeding further in this project.

amanmehta-maniac commented 5 years ago

Hi @yudhik11, thats a good start. I suggest you to go ahead with warm-up task number 2, that is - downloading & editing a sample template and trainimg a Neural SPARQL Machine model. Please go through the mentioned wiki while you do so.

yudhik11 commented 5 years ago

I am stuck at one of the steps while following the PIPELINE. Let's say I want to go with dbo: Continent, I tried extracting the properties as mentioned in the pipeline from links ontology, page but was not successful.

Can anyone guide me through this?

amanmehta-maniac commented 5 years ago

The URL (for Place class ) is http://mappings.dbpedia.org/server/ontology/classes/Place. Are you done with warmup-task 2 though? I'd suggest you go through the warmup tasks in the mentioned order. You do not need PIPELINE to be able to complete your warmup task 2.

Keep me updated.

yudhik11 commented 5 years ago

I have worked upon the Task-2 by training on multiple classes and by trying different total annotations. A brief summary is cumulatively shown here. I plan to run the experiment with more variety of classes and complex SPARQL queries.

Currently, I am starting with Task-3 which was 'reproducing the experiments.'

Dewalade1 commented 5 years ago

Hi, I would like to apply for gsoc. I would like to work on this project.

mommi84 commented 5 years ago

Thanks for sharing your results @yudhik11 and thanks for your interest @Dewalade1. Have you guys already started writing your proposals? When you think your ideas are mature enough, please share a Google doc with my handle at gmail dot com. Remember to specify which research direction you may want to investigate.

mugdhajoshi commented 5 years ago

I stuck at running build_vocab.py file. The code is written in python2.7 and now I have installed python2.7 and not able to install tensorflow getting this while installing: " Could not find a version that satisfies the requirement tensorflow (from versions: ) No matching distribution found for tensorflow" I think tensorflow only supports python3.5 and above. Can someone help me?

yudhik11 commented 5 years ago

Always mention which OS you are using and FYI Tensorflow is supported on python2.7

If you are using ubuntu:

mugdhajoshi commented 5 years ago

I am using Windows10 . https://www.tensorflow.org/install/pip?lang=python2 In the link above under '2. Create a virtual environment (recommended)' for windows it is saying "TensorFlow is not supported on Windows with Python 2.7"

wannabeOG commented 5 years ago

I haven't been in touch with Windows for some while now, but I do remember that TF was only compatible with Python 3 on Windows, This issue ("https://github.com/tensorflow/tensorflow/issues/23603") makes me believe that there is no 2.7 package readily available for Windows. That being said, there are a couple of alternatives that could be used to bypass this problem. Please keep in mind that these are the alternatives that I can think of at the moment and there might be better alternatives available which do not entail the installation of any additional software. 1) Docker installation: Download and install docker toolbox for windows https://www.docker.com/docker-toolbox. After getting that done, follow the instructions given here https://www.tensorflow.org/install/docker to set up a Python 2.7 environment and use it to get the project running 2) Set up a virtual machine on your Windows platform itself following the instructions here ("https://itsfoss.com/install-linux-in-virtualbox/") and then follow the normal instructions for Linux. This could help in the long run as carrying out development work on Windows is really cumbersome.

nausheenfatma commented 5 years ago

Hi @mugdhajoshi, if you are facing a lot of compatibility issues, you may also consider to install an Ubuntu. This link shows how to install Ubuntu on a Windows from here: https://tutorials.ubuntu.com/tutorial/tutorial-ubuntu-on-windows#0.

mugdhajoshi commented 5 years ago

Thank you @wannabeOG, @nausheenfatma for your comment. I have successfully installed ubuntu in virtualbox.

nausheenfatma commented 5 years ago

@mugdhajoshi great that you could make it work. Since the proposal deadline is approaching, you might quickly start discussing your ideas after doing the warm up tasks.

theodore3131 commented 5 years ago

Hi, I am working on the warm-up tasks stated in the wiki page and the draft proposal is nearly completed, I have been following this project for a long time and I would like to contribute to this project. And I think that the README.md should state clearly that the environment is python2.7. That will save a lot of troubles and I also wanna know if it possible to upgrade it to python 3.x in this project?
Thank you for your time!

nausheenfatma commented 5 years ago

@theodore3131 : That's great. It's advisable to quickly share your proposal in a Google doc through email stating your ideas for further discussion.

rrichajalota commented 5 years ago

@theodore3131 Of course, it's possible to upgrade the code to python 3.x. If you are willing to do so during the GSoC timeline, then please mention it in your proposal. We would also be happy to have updated documentation for easier installation/execution of the project.