IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
204 stars 80 forks source link

Decompunding with compound languages #153

Open danka74 opened 4 years ago

danka74 commented 4 years ago

Dear All,

compound languages such as the Germanic and Scandinavian languages (German, Dutch, Swedish, Danish, Norwegian, Finish, ...) do not benefit from word-start searches as much as non-compound languages such as English.

e.g. English "alcohol abuse" Swedish "alkoholmissbruk" -> "alkohol-miss-bruk"

There are a number of decompounding projects on github which might be re-used when creating the description index, https://github.com/search?q=decompounding, not all of them actively maintained.

kaicode commented 4 years ago

Great idea @danka74. The license of the library used is another consideration. Snowstorm currently uses Apache 2.0 so the library would have to be compatible with this. We welcome community collaboration on this.