elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.42k stars 24.57k forks source link

Adding OpenNLP Analysis Components #104439

Open imotov opened 7 months ago

imotov commented 7 months ago

Description

Apache OpenNLP functionality has been available in Lucene starting with v7.3.0. Based on a request from one of my clients I have exposed this functionality in elasticsearch in the form of a plugin. This plugin wraps existing Lucene tokenizer and two Lucene filters (part-of-speech tagger and lemmatized) as corresponding elasticsearch components. I would be happy to convert that plugin into a standard elasticsearch plugin and open a PR for it if there is an interest for something like this to be merged into Elasticsearch.

A similar issue was discussed some time ago in #9041 and back then the decision was made not to go with it mostly because the functionality included named entity recognition that didn’t quite look like something belonging to the analysis chain. My proposal is different because it only includes tokenization and lemmatization functionality (POS tagging is required by the lemmatizer). So, I think this is something that clearly belongs to the analysis chain and warrants a different discussion.

elasticsearchmachine commented 7 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)