NobodyWHU / GPUDMM

Topic Modeling for Short Texts with Auxiliary Word Embeddings
72 stars 17 forks source link

some questions #3

Open lounily opened 7 years ago

lounily commented 7 years ago

what is the format of the word_similarity.txt and the qa_word2id.txt? thank you very much @NobodyWHU @duanyu

duanyu commented 7 years ago

Firstly, the format of the word_similarity file is

s(0,0) s(0,1) ... s(0,|V|)
s(1,0) s(1,1) ... s(1,|V|) 
... ... ... ...
s(|V|,0) s(|V|,1) ... s(|V|,|V|)

In details, s(i,j) means the cosine similarity between ith word' embedding and jth word's embedding, |V| is the size of vocabulary, every item in same line is separated by a blank space.

Secondly, the format of the qa_word2id file is

word,id
... ...
word,id
lounily commented 6 years ago

what is the meaning of the result file _theta.txt and _assign.txt ? and which files are responding to the result file of the LDA ? If I want to compute the perplexity of the model ,which file would I use? thank you very much @NobodyWHU @duanyu

YaYaCT commented 6 years ago

hi, i want ask what is format of the _snippet_200iter_initial_status.txt? thanks@NobodyWHU

duanyu commented 6 years ago

In fact we do not use the initialFile in our experiments of SIGIR paper, we randomly initialize the topic assignments, and that's just experimental codes but we forget to delete it, we will fix that mistake :) I guess you will see the correct version in Github tomorrow. @YaYaCT

YaYaCT commented 6 years ago

thank you very much expecting the correct version @NobodyWHU

NobodyWHU commented 6 years ago

@duanyu has fixed the mistake, thank you very much! @YaYaCT

YaYaCT commented 6 years ago

作者您好: 1.想问一下关于word_similarity矩阵,每一行代表词与对应的相似词的相似度,那么这个矩阵中需要包含这个词与自己的相似度吗,看作者的文章里面说到,相似词矩阵里面包含本身这个词。 2.请问这个代码的第132行j和i的位置是不是反了,因初学,不太理解,见笑了 希望能得到作者的回复,谢谢!

2018-05-07 10:25 GMT+08:00 Jeremy Wang notifications@github.com:

@duanyu https://github.com/duanyu has fixed the mistake, thank you very much! @YaYaCT https://github.com/YaYaCT

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NobodyWHU/GPUDMM/issues/3#issuecomment-386940164, or mute the thread https://github.com/notifications/unsubscribe-auth/AlMzE7g0NpayJEmcrwWEGEsL6ghF2wz0ks5tv7CXgaJpZM4QrFKT .

NobodyWHU commented 6 years ago

@YaYaCT word_similarity 应该是个对称矩阵, [i,j] 和 [j,i] 的值应该是一样的,可能写成[i,j]更容易理解。单词与自己的相似度应该是1.