CentreForDigitalHumanities / I-analyzer

The great textmining tool that obviates all others
https://ianalyzer.hum.uu.nl
MIT License
7 stars 2 forks source link

Investigate Sense Clustering over Time (SCoT) to find synonyms #1649

Closed BeritJanssen closed 1 month ago

BeritJanssen commented 1 month ago

Is your feature request related to a problem? Please describe. Word embeddings can be considered one way of finding synonyms, but give no way of distinguishing between antonyms and synonyms other than human inspection. A visiting scholar of the People & Parliament team suggested SCoT as an alternative, see this paper

Current plan For now, investigate the technique and give feedback to the Jyväskylä team about the feasibility of integrating this method.

BeritJanssen commented 1 month ago

SCoT is not an algorithm, but an application, with multiple component, and heavy dependecy on JoBimText for pre-processing. I don't think it's worth investing time in applying the methods to current I-Analyzer corpora, especially seeing as the Jyväskylä group is building collaborations with Turku and Helsinki machine learning groups to investigate LLMs for parliamentary data.