5hirish / adam_qas

ADAM - A Question Answering System. Inspired from IBM Watson
http://www.shirishkadam.com/
GNU General Public License v3.0
357 stars 106 forks source link

Invalid Answers #36

Open ghost opened 5 years ago

ghost commented 5 years ago

When asking: "Who was the first president of the United States?"

The answer is:

Normally vice presidents hold some power and special responsibilities below that of the president. The amendment also specifies that if any eligible person serves as president or acting president for more than two years of a term for which some other eligible person was elected president, the former can only be elected president once. Mitt Romney for president. Perhaps the best known sub-national presidents are the borough presidents of the Five Boroughs of New York City. The president fulfills various ceremonial duties.

5hirish commented 5 years ago

@infosisio currently I am unable to maintain this project, I have identified some issues/shortcomings with the project if you are interested to contribute I will share it with you.

idoroiengel commented 5 years ago

@5hirish Hey, I am interested in helping out, please let me know what needs to be done, and I'll try to do something about it :)

5hirish commented 5 years ago

@idoroiengel That's great to hear. When I started the project the basic outline I chalked out was to have a Question Answering system where you would ask a question it would go and perform basic NLP operations on the question like Tokenisation, Stemming, POS tagging, Dependency extraction. It will try to extract all the relevant keywords from the question which could be used to construct a query to search on any knowledge source. After searching the on a knowledge source it would get the raw data, try to filter out irrelevant information or summarize and generate candidate answers and rank them.

5hirish commented 5 years ago

Since then a lot of things have changed with my understanding of this problem statement and the different ways to solve it. There are a lot of constructs in the system currently that can work against its favor and give out irrelevant answers such as above. To understand the current state of the system I would redirect you to /docs folder of the repo where there is an architecture diagram and a white paper of the system. I will also note down a couple of issues I am aware of here in this issue. Also, the build on Travis is failing I will also look in to that and try to fix it. In the mean time you can reach out to mean on my email address in case you need nay help with the project and trying to understand its codebase or having any troubles setting up the project.

5hirish commented 5 years ago

I have compiled this list a long time ago, so I have forgotten the specifics of it, but nonetheless, it should be a good start.

  1. Issues with the keywords being searched on Wikipedia [Selective Search]: Irrelevant keywords being searched on knowledge source leading to add noise in the extracted knowledge.
  2. Improve the keyword extraction: Working on a keyword extraction algorithm, so that the current rule-based keyword extraction can be deprecated for an unsupervised methodology. We can look into the dependency relations of each token and take into account its other grammatical features to identify the keywords in it.
  3. Search on the structured info: A lot of tabular and structured information is extracted from Wikipedia. Work on an algorithm to search on nested JSON data to identify the relevant keys in it and get their values.
  4. Question classification: Revisited question classification model (Support Vector Machine), tweak it if necessary try to include the classified label in keyword extraction or query construction phase to improve keyword extraction/query construction
  5. Information retrieval: Revisit information extraction phase (Vector Space Model), can we improve it with LSTM maybe?
  6. Can we leverage Elasticsearch more in the project?
5hirish commented 5 years ago

@idoroiengel Maybe this easiest thing to start with can be upgrading the dependencies like spacy. I would be glad if we can revive this project and will try to take this up more regularly!!!

5hirish commented 5 years ago

Fixed build issues with Travis CI

TharunAts commented 5 years ago

what is know_corp in Corpus and how does it will affect the model?

idoroiengel commented 5 years ago

@5hirish sounds good, I also already glanced at some of the docs, and I think I got the basics. I work mostly on Android, but since I'm MA Linguistics graduate I want to do some NLP coding. I can take a look at the dependencies this week. I built it successfully with the current dependencies on my local machine, and ran it a few times with several queries to test the system.

idoroiengel commented 5 years ago

@5hirish do you have any specific notes for the branches of the project that I should be aware of? Also, should we continue this discussion in a different conversation?

5hirish commented 5 years ago

@idoroiengel currently all the branches are stale and no feature is under development. So, master is the stable branch. Yes, let us carry out this conversation on mail (mail@5hirish.com) or Gitter or maybe Slack.

Also, in December I was thinking of trying to implement some of the SQUAD 2.0 approaches. SQUAD 2.0 Think ths would be a good start to kickstart the project again. Going through some of the approaches from this competition and trying to implement one of it that uits our project and the problem we are trying to solve.

5hirish commented 5 years ago

@TharunAts this would be an intermediate storage file to store the extracted knowledge source from Wikipedia which is later processed and ranked. Not proud of how I approached this problem at the time :sweat_smile:

ghost commented 5 years ago

It would be better if you keep the discussion here and not via mail so that others can view it and participate too.

On Sun, Jun 2, 2019 at 7:43 PM Shirish Kadam notifications@github.com wrote:

@TharunAts https://github.com/TharunAts this would be an intermediate storage file to store the extracted knowledge source from Wikipedia which is later processed and ranked. Not proud of how I approached this problem at the time 😅

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/5hirish/adam_qas/issues/36?email_source=notifications&email_token=AJRXIPRNEEE6D7QRKM5OA4LPYP2DDA5CNFSM4HN3B3X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXZONY#issuecomment-498046775, or mute the thread https://github.com/notifications/unsubscribe-auth/AJRXIPXXCW37Z6NCV2KGVZDPYP2DDANCNFSM4HN3B3XQ .

5hirish commented 5 years ago

@infosisio @idoroiengel I have created a Gitter chat for the project, which would be much more convenient for any discussions related to the project. As broad conversations would be quite inconvenient to carry out on a single issue. Feel free to join Gitter chat

Also, I had created a Kanban project board here on GitHub when I was thinking of SQAUD competition and have documented whatever initial findings I had done. Kanban Board