EvidentSolutions / elasticsearch-analysis-voikko

Finnish language analysis for Elasticsearch using Voikko
Apache License 2.0
7 stars 7 forks source link

Support expanding of compound words into separate tokens #4

Open tituomin opened 5 years ago

tituomin commented 5 years ago

I have patched our version of the plugin (based on v0.3.0) and added a configuration parameter expandCompounds to optionally support expanding of compound words (yhdyssanat) into separate tokens.

https://github.com/City-of-Helsinki/elasticsearch-analysis-voikko/commit/9a6bd8165e4de3a7d4f8bbe0993ebaec17197f94

I would like to get this feature into master and upstream, if you find it desirable. I can port it to master myself, but currently we are using 0.3.0.

We have found that extracting the parts of compound words is highly desirable in the index analysis stage, for several reasons:

komu commented 5 years ago

Sounds great, if you'll open a PR I'll look forward into merging it.

tituomin commented 5 years ago

@komu here is my attempt at a PR.