brmson / yodaqa

A Question Answering system built on top of the Apache UIMA framework.
http://ailao.eu/yodaqa
Other
619 stars 205 forks source link

Setup instructions for local Wikipedia dump needs better guidelines #60

Closed talentoscope closed 7 years ago

talentoscope commented 7 years ago

I am following along from the README.md in data/enwiki and have created the data/example folder, put the dump, and extraction, and single xml in this folder. Solr was extracted to the data/example directory like data/example/solr-versionnumber The files in data/enwiki have been copied to data/example as instructed, and /data/example/collection1/conf/data-config.xml has been edited to reflect the dump date I appended to the filename. In /data/example, I have attempted to run the start.jar using "java -Dsolr.solr.home=enwiki -jar start.jar", but I get "WARNING: Nothing to start, exiting ..."

I'm guessing this is probably due to folder placement error, but there is no breakdown or full explanation on how it should be structured, or at least it's difficult to follow.

Any guidance on how you did this would be greatly appreciated.

talentoscope commented 7 years ago

Realised the mistake. All of this should be in data/enwiki itself. Still, starting solr with java gives Error: Unable to access jarfile start.jar.

This start.jar does not appear to be in the solr/example directory from the download. It is however in the server folder, but not sure if just copying a jar file will solve this, so am trying to use the 4.6.0 version from the README

talentoscope commented 7 years ago

Used version 4.6.0, but I am told it is unable to create collection1. Obviously this already exists in data/enwiki that is symlinked to example/. Is it supposed to be creating it itself, or is there an undocumented issue?

3393 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CachingDirectoryFactory – looking to close /home/roy/yodaqa/data/enwiki/collection1/data [CachedDir<>] 3393 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CachingDirectoryFactory – Closing directory: /home/roy/yodaqa/data/enwiki/collection1/data 3393 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CachingDirectoryFactory – looking to close /home/roy/yodaqa/data/enwiki/collection1/data/index [CachedDir<>] 3393 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CachingDirectoryFactory – Closing directory: /home/roy/yodaqa/data/enwiki/collection1/data/index 3394 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer – Unable to create core: collection1 org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.SolrCore.(SolrCore.java:834) at org.apache.solr.core.SolrCore.(SolrCore.java:625) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:557) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:592) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:271) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:263) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: RequestHandler init failure at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:167) at org.apache.solr.core.SolrCore.(SolrCore.java:768) ... 11 more Caused by: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:470) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:401) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:526) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:599) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:153) ... 12 more

pasky commented 7 years ago

To be clear, the example/ subdirectory should be part of solr-4.6.0 and should contain a start.jar. You are expected to symlink yodaqa's data/enwiki/ to enwiki in the example/ subdirectory.

If you symlink things around, what might break is

  <lib dir="../../../contrib/dataimporthandler/lib" regex=".*\.jar" />
  <lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" />

in data/enwiki/collection1/conf/solrconfig.xml - I'd try to put in some absolute paths with solr-4.6.0 directory instead of ../../../

talentoscope commented 7 years ago

Tried explicity stating ~/yodaqa/... etc but that just concatenated the ../../../contrib/....etc to the command, which in theory should've worked, going 3 levels up to the contrib folder, but for some reason it just doesn't like it.

Out of ideas now, so trying to start solr on its own, add the enwiki xml to it using post.jar/post.sh and trying it that way, and then point yodaqa to that instance. Should work, it's essentially the same thing, and have copied the collection1 folder contents to the new one. Fingers crossed!

talentoscope commented 7 years ago

This worked. Closing report.