Apache OpenNLP functionality has been available in Lucene starting with v7.3.0. Based on a request from one of my clients I have exposed this functionality in elasticsearch in the form of a plugin. This plugin wraps existing Lucene tokenizer and two Lucene filters (part-of-speech tagger and lemmatized) as corresponding elasticsearch components. I would be happy to convert that plugin into a standard elasticsearch plugin and open a PR for it if there is an interest for something like this to be merged into Elasticsearch.
A similar issue was discussed some time ago in #9041 and back then the decision was made not to go with it mostly because the functionality included named entity recognition that didn’t quite look like something belonging to the analysis chain. My proposal is different because it only includes tokenization and lemmatization functionality (POS tagging is required by the lemmatizer). So, I think this is something that clearly belongs to the analysis chain and warrants a different discussion.
Description
Apache OpenNLP functionality has been available in Lucene starting with v7.3.0. Based on a request from one of my clients I have exposed this functionality in elasticsearch in the form of a plugin. This plugin wraps existing Lucene tokenizer and two Lucene filters (part-of-speech tagger and lemmatized) as corresponding elasticsearch components. I would be happy to convert that plugin into a standard elasticsearch plugin and open a PR for it if there is an interest for something like this to be merged into Elasticsearch.
A similar issue was discussed some time ago in #9041 and back then the decision was made not to go with it mostly because the functionality included named entity recognition that didn’t quite look like something belonging to the analysis chain. My proposal is different because it only includes tokenization and lemmatization functionality (POS tagging is required by the lemmatizer). So, I think this is something that clearly belongs to the analysis chain and warrants a different discussion.