compound languages such as the Germanic and Scandinavian languages (German, Dutch, Swedish, Danish, Norwegian, Finish, ...) do not benefit from word-start searches as much as non-compound languages such as English.
e.g.
English "alcohol abuse"
Swedish "alkoholmissbruk" -> "alkohol-miss-bruk"
There are a number of decompounding projects on github which might be re-used when creating the description index, https://github.com/search?q=decompounding, not all of them actively maintained.
Great idea @danka74.
The license of the library used is another consideration. Snowstorm currently uses Apache 2.0 so the library would have to be compatible with this.
We welcome community collaboration on this.
Dear All,
compound languages such as the Germanic and Scandinavian languages (German, Dutch, Swedish, Danish, Norwegian, Finish, ...) do not benefit from word-start searches as much as non-compound languages such as English.
e.g. English "alcohol abuse" Swedish "alkoholmissbruk" -> "alkohol-miss-bruk"
There are a number of decompounding projects on github which might be re-used when creating the description index, https://github.com/search?q=decompounding, not all of them actively maintained.