brmson / yodaqa

A Question Answering system built on top of the Apache UIMA framework.
http://ailao.eu/yodaqa
Other
619 stars 205 forks source link

Multiple IDs conflict when adding to solr #61

Closed talentoscope closed 7 years ago

talentoscope commented 7 years ago

When adding new xml files to a solr instance using bzcat and the wiki extractor, when adding them to collection1 they appear to be given the same ID which causes problems when YodaQA crossreferences links. This then goes on to cause problems with YodaQA parsing incorrect documents.

Would it be possible to give instructions to adding new data sources, such as other wikis (Simple, Species, Wikiversity, etc), as the live version seems to encompass other sources?

pasky commented 7 years ago

Unfortunately, that'd be quite some effort which we are unable to spend right now - this is essentially #17 duplicate, I'd propose. The default YodaQA version uses only the enwiki data source for text too. It just also does a Bing search (which is also part of the source).