The Lucene Tokenisation has been replaced with annotations type/value taken from the Behemoth docs. It would be good to add the Lucene Tokenisation back as in the original Mahout class so that users who need Behemoth mostly for converting from Nutch or parsing with Tika don't need to use the GATE or UIMA modules just for getting tokens
The Lucene Tokenisation has been replaced with annotations type/value taken from the Behemoth docs. It would be good to add the Lucene Tokenisation back as in the original Mahout class so that users who need Behemoth mostly for converting from Nutch or parsing with Tika don't need to use the GATE or UIMA modules just for getting tokens