Closed marcburri closed 4 years ago
Hi, thank you for your interest in the details. I needed some time to understand the general logic of how co-occurrence is counted in the context of text2vec myself. The package uses a sliding window that moves over the text - hence, a windows is created for each token in a given text. You might find a general discussion that might help you to understand the logic here in the following issue: #253. I hope this helps and you can follow the logic.
Thank you, everything clear now.
Am 02.11.2020 um 22:05 schrieb Manuel Bickel notifications@github.com:
Hi, thank you for your interest in the details. I needed some time to understand the general logic of how co-occurrence is counted in the context of text2vec. The package uses a sliding window that moves over the text - hence, a windows is created for each token in a given text. You might find a general discussion that might help you to understand the logic here in the following issue: #253 https://github.com/dselivanov/text2vec/issues/253. I hope this helps and you can follow the logic.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dselivanov/text2vec/issues/328#issuecomment-720724130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL5BJGMKNOXIFL6AQKQ7KQTSN4NLPANCNFSM4THTBWCQ.
Hi,
I was wondering whether the documentation for the
coherence()
function should be adjusted.Namely the following line where you explain the steps to create TCM for extrinsic measures from an external corpus
should look like this
since a document of say length 111 tokens would give us 2 virtual documents with
window_size=110
and not 111 as the current version suggests.