OpenSextant / SolrTextTagger

A text tagger based on Lucene / Solr, using FST technology
Apache License 2.0
173 stars 37 forks source link

NullPointerException in TaggerRequestHandler.java:199 #47

Closed xiaohan2012 closed 8 years ago

xiaohan2012 commented 8 years ago

When I do:

curl -XPOST \
  'http://localhost:8983/solr/test/tag?overlaps=NO_SUB&tagsLimit=5000&fl=*' \
  -H 'Content-Type:text/plain' -d @example.txt

The core name is test. An unrelated question: the url in the README.md is <host>:<port>/solr/tag, which however, returns 404. In my case, <host>:<port>/solr/<core_name>/tag works.

The server returns(I extracted the trace from the XML result):

java.lang.NullPointerException
    at org.opensextant.solrtexttagger.TaggerRequestHandler$1.&lt;init&gt;(TaggerRequestHandler.java:199)
    at org.opensextant.solrtexttagger.TaggerRequestHandler.handleRequestBody(TaggerRequestHandler.java:168)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:640)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:436)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:745)

I am using:

schema.xml:

<schema name="test" version="1.5">
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>

        <fieldType name="tag" class="solr.TextField" positionIncrementGap="100" postingsFormat="Memory"
                           omitTermFreqAndPositions="true" omitNorms="true">
          <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.EnglishPossessiveFilterFactory" />
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory" />

                <filter class="org.opensextant.solrtexttagger.ConcatenateFilterFactory" />
          </analyzer>
          <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.EnglishPossessiveFilterFactory" />
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory" />
          </analyzer>
        </fieldType>

        <field name="_version_" type="long" indexed="true" stored="true"/>
        <field name="surface_name" type="tag" indexed="true" stored="true"/>
        <field name="occurrences" type="int" indexed="false" stored="true"/>
        <field name="log_occurrences" type="double" indexed="false" stored="true"/>
</schema>

Part of solrconfig.xml:

  <requestHandler name="/tag" class="org.opensextant.solrtexttagger.TaggerRequestHandler">
        <lst name="defaults">
      <str name="field">surface_name</str>
      <str name="fq">*:*</str>
        </lst>
  </requestHandler>
dsmiley commented 8 years ago

Hello. The text tagger currently requires a "uniqueKey" field to be defined in the schema. Most Solr schemas do. I will have the tagger throw a batter exception to clearly bring this to the user's attention.

That said... I could imagine adding a feature to eliminate this requirement in lieu of using Lucene's internal docIDs. It would be pretty easy.