USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Exception while configuring with Solr 6.4 #84

Closed adityardesai closed 7 years ago

adityardesai commented 7 years ago

Hi

As per wiki instructions, I am using configuring the Sparkler. When I try to create a core in Apache Solr 6.4, I get an exception as follows. org.apache.solr.common.SolrException: Could not load conf for core crawldbSparkler: Error loading solr config from /home/aditya/src/solr-6.4.1/server/solr/crawldbSparkler/conf/solrconfig.xml at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:84) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:888) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:827) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:88) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:379) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Error loading solr config from /home/aditya/src/solr-6.4.1/server/solr/crawldbSparkler/conf/solrconfig.xml at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:187) at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:96) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:77) ... 36 more Caused by: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/home/aditya/src/solr-6.4.1/server/solr/crawldbSparkler' at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:406) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:361) at org.apache.solr.core.Config.(Config.java:120) at org.apache.solr.core.Config.(Config.java:90) at org.apache.solr.core.SolrConfig.(SolrConfig.java:202) at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:179) ... 38 more

The problem persists evern after reloading/restarting Solr. Any help is appreciated.

thammegowda commented 7 years ago

@adityardesai

Thanks for reporting.

There are two issues here:

First, you are using crawldbSparkler as the core/collection name. You will have to use update crawldb.uri in sparkler-default.yaml file to reflect these changes. OR, (I recommend) using crawldb as the name for your collection/core -- that is default!

Second,

Caused by: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/home/aditya/src/solr-6.4.1/server/solr/crawldbSparkler'

Solr is unable to find solrconfig.xml! but, it is here conf/solr/crawldb/conf/solrconfig.xml

Please check our docker file to know how solr is setup: https://github.com/USCDataScience/sparkler/blob/1d92edfd8e53fdd0c1eeb6da31681c4ee76b0075/sparkler-deployment/docker/Dockerfile#L47-L55

# Setup Solr
RUN wget http://archive.apache.org/dist/lucene/solr/6.2.1/solr-6.2.1.tgz -O /data/solr.tgz && \
    cd /data/ && tar xzf /data/solr.tgz

RUN mv /data/solr-* /data/solr && rm /data/solr.tgz
RUN apt-get install lsof
RUN /data/solr/bin/solr start && \
    /data/solr/bin/solr create_core -c crawldb -d /data/sparkler/conf/solr/crawldb/ && \
    /data/solr/bin/solr stop
thammegowda commented 7 years ago

I just noticed - solr 6.4.x is necessary for the solr cloud mode. Solr 6.2.1 works fine for standalone server mode (however, if you try to use solr 6.2.1 in cloud mode, it hits buggy code in solr that affects the functionality).

Are you using Solr cloud or standalone solr?

adityardesai commented 7 years ago

Using 6.4.1 in local mode. I shifted to 6.2.1 and the same issue occurred again. To eliminate the possibilities, I created the core 'cralwdb' on the command line and as of now, it looks fine. But is it necessary to copy the contents of sparkler/conf/solr/crawldb/ to /home/aditya/src/solr-6.2.1/server/solr/crawldb/conf ?

adityardesai commented 7 years ago

I think I got it running. To summarize

  1. We need to create a core 'crawldb' on command line and then copy all the configuration from sparkler/conf/solr/crawldb to /solr-6.2.1/server/solr/crawldb/conf .
  2. And then reload the core on the UI Is it correct?
thammegowda commented 7 years ago

I am glad you got it working! 👍