gmftbyGMFTBY / Copyisallyouneed

[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM
https://openreview.net/forum?id=CROlOA9Nd8C&referrer=%5Bthe%20profile%20of%20Tian%20Lan%5D(%2Fprofile%3Fid%3D~Tian_Lan7)
MIT License
182 stars 22 forks source link

Question about retrieval pool size #10

Open WuTianming opened 1 year ago

WuTianming commented 1 year ago

Hi, thank you for your great work!

I'm trying to decipher some of your experiment notes, specifically this one (data/wikitext103_1024/README_find_mbest_chunk.md).

The second section of this file refers to:

确定最佳chunk 128,找最佳token数
1. chunk_size: 128, 64: 24.9
2. chunk_size: 128, 128: 29.18
3. chunk_size: 128, 256: 33.53%
4. chunk_size: 128, 512: 37.84%
5. chunk_size: 128, 1024: 41.53%

mismatch的原因是因为chunk导致的

Could you explain what those numbers mean? I'm particularly interested in how the copy ratio scales with the retrieval pool size. I wonder if the second number in each line (64, 128, 512, 1024) refers to the pool size in each individual experiment.