Closed scandukuri closed 6 days ago
Hi, Thank you for the fix. Would you consider creating a new branch? The change from "u16" to "i32" isn't needed for models with small vocabulary size and would increase disk storage usage unnecessarily.
Yes! Can make a ‘llama3’ branch. I can make the existing changes (DraftRetriever) + the necessary changes to modeling_llama_kv.py.
That sounds great! Appreciate your effort.
Here we make the necessary changes to read and write suffixes to memory and file for large tokenizers; the original implementation only supported token IDs up to Rust u16::MAX (65,535).
Crucially, using Rust i32 for reading and writing individual token IDs (instead of u16 originally) allows the tool to support token IDs of up to Rust i32::MAX (2,147,483,647), and still allows negative placeholder IDs for padding in the implementation like -2.