elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.45k stars 24.88k forks source link

Adding OpenNLP Analysis Components #104439

Open imotov opened 10 months ago

imotov commented 10 months ago

Description

Apache OpenNLP functionality has been available in Lucene starting with v7.3.0. Based on a request from one of my clients I have exposed this functionality in elasticsearch in the form of a plugin. This plugin wraps existing Lucene tokenizer and two Lucene filters (part-of-speech tagger and lemmatized) as corresponding elasticsearch components. I would be happy to convert that plugin into a standard elasticsearch plugin and open a PR for it if there is an interest for something like this to be merged into Elasticsearch.

A similar issue was discussed some time ago in #9041 and back then the decision was made not to go with it mostly because the functionality included named entity recognition that didn’t quite look like something belonging to the analysis chain. My proposal is different because it only includes tokenization and lemmatization functionality (POS tagging is required by the lemmatizer). So, I think this is something that clearly belongs to the analysis chain and warrants a different discussion.

elasticsearchmachine commented 10 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)