facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.94k stars 531 forks source link

Question: In DocSpace, are docs aggregated the same way users are? #282

Open michaelpaulhirsch opened 4 years ago

michaelpaulhirsch commented 4 years ago

Hi, In the github documentation under DocSpace, it says the following:

Model: Each document is represented by a bag-of-words of the document. Each user is represented as a (bag of) the documents that they liked/clicked in the past. At training time, at each step one random document is selected as the label and the rest of the bag of documents are selected as input.

I would like to know if the same parameter p is used for both aggregating words up to doc level and aggregating docs up to user level. That is to say, are the following representations correct?: doc_1 = (word_vec_1 + word_vec_2 + ... + word_vec_n)/n^p and user_1 = (doc_1 + doc_2 + ... + doc_m)/m^p

In the above example, the same value for p is used to aggregate words to doc-level and docs to user-level. Is this what is going on under the hood in DocSpace?

Thank you