bertvandepoel / snelSLiM

A linguistic set of tools in Go and web interface in PHP to do quick Stable Lexical Marker Analysis
GNU Affero General Public License v3.0
3 stars 0 forks source link

Vector space near neighbour indication (distributional semantics) #24

Closed bertvandepoel closed 4 years ago

bertvandepoel commented 4 years ago

Training a model has proven to be too inaccurate with the sizes of corpora snelSLiM will often be worked with (far less than 10 million, often even smaller than 1 million words), including pre-trained, high quality models for every relevant language would not be realistic, as well as yield problems with registers, dialects, colloquial language, neologisms, etc.