Questions about phrase encoder

gmftbyGMFTBY / Copyisallyouneed

[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM

https://openreview.net/forum?id=CROlOA9Nd8C&referrer=%5Bthe%20profile%20of%20Tian%20Lan%5D(%2Fprofile%3Fid%3D~Tian_Lan7)

MIT License

182 stars 22 forks source link

Questions about phrase encoder #3

Closed noobimp closed 1 year ago

noobimp commented 1 year ago

Hi, everyone, nice job! After reading paper, I have some little questions about phrase encoder: "a document of length m" Is "m" the number of tokens in a document? Does "s, e" also be the token at correspond positions? A phrase embedding is represented by these two tokens' embeddings, right?

Thank you for your reply!

gmftbyGMFTBY commented 1 year ago

Thank you for your interest in our work. For your first question, the answer is Yes. The $m$ denotes the length of the document. For your second question, the answer is also Yes. $s, e$ are the index of the corresponding tokens in the document.