NLP4L / attic-nlp4l

(deprecated) Please use new nlp4l instead.
Apache License 2.0
66 stars 5 forks source link

TermExtractor doesn’t work with previous lucene index. #31

Open samfukuda opened 9 years ago

samfukuda commented 9 years ago

I faced NPE when I tried to pull out terms from existing index built by Solr 4.10.4 and 4.8SNAP(lucidworks ed). yap, Not Lucene 5.x.

Problem or design ? or something I made ?

In case of 4.10.4

[test@localhost nlp4l]$ java -cp "lib/*" org.nlp4l.extract.TermsExtractor --field body --out terms-sp2.txt /tmp/index-sp2
Exception in thread "main" java.lang.NullPointerException
    at org.nlp4l.lucene.TermsExtractor.execute(TermsExtractor.java:192)
    at org.nlp4l.extract.TermsExtractor$.main(TermsExtractor.scala:93)
    at org.nlp4l.extract.TermsExtractor.main(TermsExtractor.scala)
[test@localhost nlp4l]$ 

In case of 4.8SNAP

[test@localhost nlp4l]$ java -cp "lib/*" org.nlp4l.extract.TermsExtractor --field body --out terms-sp1.txt /tmp/index-sp1
Exception in thread "main" java.lang.NullPointerException
    at org.nlp4l.lucene.ConcatFreqLRCompoundNounScorer.getConcatenatedNounScore(ConcatFreqLRCompoundNounScorer.java:43)
    at org.nlp4l.lucene.LRCompoundNounScorer.getLeftConcatenatedNounScore(LRCompoundNounScorer.java:66)
    at org.nlp4l.lucene.LRCompoundNounScorer.score(LRCompoundNounScorer.java:60)
    at org.nlp4l.lucene.ConcatFreqDFLRCompoundNounScorer.score(ConcatFreqDFLRCompoundNounScorer.java:58)
    at org.nlp4l.lucene.TermsExtractor.execute(TermsExtractor.java:206)
    at org.nlp4l.extract.TermsExtractor$.main(TermsExtractor.scala:93)
    at org.nlp4l.extract.TermsExtractor.main(TermsExtractor.scala)
[test@localhost nlp4l]$ 
kojisekig commented 9 years ago

Thank you for reporting this problem!

Yes, I think it'd be happened on an environment other than Lucene 5.x.

If you fork NLP4L and try to make it downgrade for Lucene 4.x, I think I can help you when you face problems such as compile errors.

Would it work for you? Or, do you have any idea that I can do for you?

samfukuda commented 9 years ago

Okay, I’m clear what you telling. You mean fix this one is requiring dive in deep and dig it out.

just tried to build /w 4.10.4, then I faced no existing lucene-backward-codecs.jar this version. it means requiring code mod/fix something in my guess. Yap, I understand it takes little longer for me. I will concern to be or not to be later. because not sure should/must be or not.

so I will touch you if I do something make it.

thx for your comments, -su