Closed dweiss closed 13 years ago
Comment by Stanisław Osiński (@stanislawosinski) (migrated from JIRA)
While we're waiting for Lucene 2.9.1 to come out, maybe we would be able to handle this for 3.1.1?
Comment by Dawid Weiss (@dweiss) (migrated from JIRA)
Investigated the possibilities here.
Nutch still has Lucene 2.9.x, whereas we use Lucene 3.0.0. Also, there will be a bunch of other libraries required to add Carrot2 3.0+ to Nutch, some of them heavy (Mahout, google collections, etc.). I don't know if Nutch folks will appreciate this much.
What do you think – should be try, or leave Nutch with 2.x line?
Comment by Stanisław Osiński (@stanislawosinski) (migrated from JIRA)
I think the extra libraries wouldn't be more than 1 or 2 MB together, right? So the biggest problem seems Lucene – maybe we could schedule this at a point when Lucene is upgraded in Nutch? After all, upgrading from 2.9.x to 3.0.0 is only a matter of fixing deprecations. I don't see a relevant issue in Nutch's JIRA though.
Comment by Dawid Weiss (@dweiss) (migrated from JIRA)
Older Lucene (2.9) is a show-stopper for this, unfortunately. There are API incompatibilities that cause exceptions at runtime. I'll file an issue with Nutch, perhaps they'll wish to upgrade and then we can proceed.
Comment by Dawid Weiss (@dweiss) (migrated from JIRA)
Equivalent issue in Nutch: https://issues.apache.org/jira/browse/NUTCH-673
Comment by Stanisław Osiński (@stanislawosinski) (migrated from JIRA)
We need to wait until Nutch upgrades to Lucene 3.0. Moving to 3.3.0 for the time being.
Comment by Dawid Weiss (@dweiss) (migrated from JIRA)
Will upgrade after we release 3.4.0.
Comment by Stanisław Osiński (@stanislawosinski) (migrated from JIRA)
Some rough-cuts Nutch integration code for Carrot2 3.x I once prepared for a client.
Comment by Dawid Weiss (@dweiss) (migrated from JIRA)
Nutch doesn't come with a frontend anymore. Clustering plugin has been removed (and exists Solr which can be used as the sink from Nutch's crawls).
Related issue on Apache JIRA: https://issues.apache.org/jira/browse/NUTCH-673
Issue: CARROT-443 (migrated from JIRA), created by Stanisław Osiński (@stanislawosinski), 2 votes, resolved Jun 21 2011 Attachments: Clusterer.java, HitsClusterAdapter.java, TestClusterer.java Linked issues:
828