DigitalPebble / behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281 stars 60 forks source link

Output to LucidWorks 2.1 #36

Closed gsingers closed 11 years ago

gsingers commented 12 years ago

Hey Julien,

Would you be open to a patch that makes Behemoth work LucidWorksEnterprise? It's a standalone module (you can see it on my fork under the LWE branch). It only requires Solr dependencies. In other words it's all open source, it is just the library I use is the Solr one specifically shipping with LucidWorks. It pretty much also shows how Behemoth should be updated for Solr4, as well.

The reason I ask, is I'm tired of having to merge.

Thanks, Grant

jnioche commented 12 years ago

Hi Grant, I'll have a look at your branch and am not against the idea. Ideally we should be able to have standalone modules referring to behemoth-core without needing to have the whole of behemoth. From your plugin you could simply have a dependency to a local or remote copy of behemoth-core. Wouldn't that be easier? I need to flag Behemoth as 0.1 soon and publish the artefacts on Maven so that people can have their own standalone modules. What do you think?

gsingers commented 12 years ago

Yeah that works too. The lucidworks module really is just Solr, just a specific version.

jnioche commented 12 years ago

see https://github.com/jnioche/behemoth-commoncrawl for a standalone version of the commoncrawl module - which I haven't tested yet. BTW : not impressed with the commoncrawl library - building the jar has been an absolute mission