Open mschoch opened 10 years ago
We should also consider adding support for the hyphenation-based approaches as well.
The current state-of-the-art for German decompounding appears to be https://dl.acm.org/citation.cfm?id=1787593 , with a brief description in http://www.aclweb.org/anthology/P08-2064
See https://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html
This would be useful for languages like German, Swedish, and others that commonly have compound words, and users should be able to search for the consituent words.