brmson / yodaqa

A Question Answering system built on top of the Apache UIMA framework.
http://ailao.eu/yodaqa
Other
619 stars 205 forks source link

How to add more fulltext sources #62

Closed talentoscope closed 7 years ago

talentoscope commented 7 years ago

This software is simply astounding.

I'd love to know, as would many, exactly how to add multiple sources of information. Ideally by adding more documents to solr for indexing, such as other Wikis, Project Gutenberg texts, etc. I assume these would all be processed with the fulltext search using solr whether there is a dbpedia clue or not?

Please help.

pasky commented 7 years ago

Documenting this is a subject of #17 but it is not possible out of the box, only in principle. (The architecture allows it, but there is no explicit code support for querying multiple Solr indices. Sure, you could just make sure IDs are non-duplicate and index everything in a single Solr collection, as a starting point...)

talentoscope commented 7 years ago

Thanks for the reply. Will look at doing that, and maybe playing with the code to add multiple instances of solr (or multiple collections).

pasky commented 7 years ago

Any contributions to the code or just to the documentation will be welcome!

talentoscope commented 7 years ago

Will definitely be contributing code back if I come up with anything, just getting up to speed with the code. Not used to java.

On Wed, 21 Sep 2016, 20:34 Petr Baudis, notifications@github.com wrote:

Any contributions to the code or just to the documentation will be welcome!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brmson/yodaqa/issues/62#issuecomment-248718585, or mute the thread https://github.com/notifications/unsubscribe-auth/AVItTvG0Cu36QVOJKD8MhvpN0kwZ_YZbks5qsYaogaJpZM4KCKAv .

talentoscope commented 7 years ago

Having looked at the code, I really don't think I'm going to be much use there, so instead will create a good dataset of questions sourced from many places, will curate this with question, answer, LAT type and anything else you feel necessary to help towards training the system.

pasky commented 7 years ago

That would be also really cool! :)

talentoscope commented 7 years ago

Made a start on this last night, up to about 250 questions. There are a few inference based ones in there too but shouldn't be too hard to still find the answer using current system. Benchmarking using base YodaQA too with correct answer position number and confidence. Hopefully that extra info will go towards diagnosis, easily removable. :)