DigitalPebble / behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281 stars 60 forks source link

Upgrade to Mahout 0.9 #52 #53

Closed lewismc closed 9 years ago

lewismc commented 9 years ago

Hi @jnioche, this PR

This is not thoroughly tested and I am skeptical about the below proposed change

https://github.com/DigitalPebble/behemoth/compare/master...lewismc:BEHEMOTH-52?expand=1#diff-d524e30136046dca4f2964333463676aL324

jnioche commented 9 years ago

removes unnecessary elements from each of the child pom's, they already inherit this from parent pom.xml

see comments

This is not thoroughly tested and I am skeptical about the below proposed change

you mean Class<? extends Analyzer> analyzerClass = Analyzer.class;?

See SparseVectorsFromSequenceFiles which our class should mimic as much as possible.

BTW I see that they have release Mahout 0.10; have you tried using that version as a dependency?

lewismc commented 9 years ago

Hi Julien, All of your comments are addressed bar the upgrade to Mahout 0.10. If you want me to try that as well that I can. I'll begin work on it on the same branch and if it works out i will send another PR Julien. Thanks

jnioche commented 9 years ago

thanks @lewismc! Mahout 0.10 : entirely up to you. If it is straightforward then maybe worth doing.