Question about retrieval pool size

Hi, thank you for your great work!

I'm trying to decipher some of your experiment notes, specifically this one (data/wikitext103_1024/README_find_mbest_chunk.md).

The second section of this file refers to:

确定最佳chunk 128,找最佳token数
1. chunk_size: 128, 64: 24.9
2. chunk_size: 128, 128: 29.18
3. chunk_size: 128, 256: 33.53%
4. chunk_size: 128, 512: 37.84%
5. chunk_size: 128, 1024: 41.53%

mismatch的原因是因为chunk导致的

Could you explain what those numbers mean? I'm particularly interested in how the copy ratio scales with the retrieval pool size. I wonder if the second number in each line (64, 128, 512, 1024) refers to the pool size in each individual experiment.

gmftbyGMFTBY / Copyisallyouneed

Question about retrieval pool size #10