EvidentSolutions / elasticsearch-analysis-voikko

Finnish language analysis for Elasticsearch using Voikko
Apache License 2.0
7 stars 7 forks source link

Implement optional expanding of compound words into separate tokens #5

Open tituomin opened 5 years ago

tituomin commented 5 years ago

Please note that this implementation contains code from https://github.com/NatLibFi/SolrPlugins/tree/master/Voikko which is a National Library of Finland project which was kindly relicensed by my request to be compatible with this project.

See issue #4

tituomin commented 5 years ago

Thanks a lot!

I studied the code and tried to understand it and came up with some questions and comments. Some of the comments are mostly about my personal preference towards the style of the codebase, but then there are some real questions about the functionality as well.

If you can go through my comments and figure out answers as how the code really should work I can take care of any post-merge cleanups, but before merging I'd like to be sure to have an understanding. So especially the parsing questions towards the end of the review interest me.

Thank you! This is an excellent review. I realize that some work needs to be done to make the implementation mature and easier to reason about. I will try to get back with a new proposal as soon as possible.