Closed manuelbickel closed 5 years ago
I can easily add counter of windows to C++ code. In your example above you use window_size = 2
but in comments you use window of length 4. Also note that when context is symmetric than we count cooc(a, b) = cooc(b, a)
. So in example above windows and cumulative counts look like ("central" word is on the first place in each row):
"a b c b a x x a b c b a"
a b c 1
b c b 1
c b a 1
b a x 2
a x x 2
x x a 2
x a b 2
a b c 3
b c b 3
c b a 3
b a _ 4
a _ _ 4
Hi Dmitry,
with reference to the implementation of coherence metrics (#252, formerly #241) some metrics use term probabilities instead of counts. Therefore, we need the number of skip grams windows, which represent the number of "virtual documents" used to calculate counts to feed it as
n_doc_tcm
to the current version of thecoherence
function. Due to my lack of knowledge on C, I did not fully understand the code to generatetcm
and could, thus, not correclty calculate this number. Below are two approaches which I tried and a rough explanation of my intuitive understanding how co-occurrence counting is done (which is all wrong). Maybe you find some time to give me a hint how counting co-occurrence with sliding windows works or provide a function that allows to count the number of windows, from which I could update my understanding? Thanks in advance.