DigitalPebble / behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Other
281 stars 60 forks source link

Write Tutorial on processing Enron corpus with Tika #22

Closed jnioche closed 13 years ago

jnioche commented 13 years ago

See https://issues.apache.org/jira/browse/TIKA-657?focusedCommentId=13030467#comment-13030467

jnioche commented 13 years ago

http://digitalpebble.blogspot.com/2011/05/processing-enron-dataset-using-behemoth.html