Closed cjdjr closed 4 months ago
I've never tried it but you could try averaging the computed position embeddings using the "source" matrix. Basically, average the position embeddings of all individual tokens that were merged into one and use that as the new position embedding.
Can ToMe be applied together with RoPE? How to calculate position embeddings after merging token?