Closed zsnoob closed 5 months ago
Perhaps modeling_eagle is specifically designed for inference using a custom model?
Yes, modeling_eagle can be used to accelerate any model in the transformers library. To use modeling_eagle, you need to slightly modify the code (pre-allocated KV cache and tree mask, refer to modeling_llama_kv.py and modeling_Mixtral_kv.py for examples).
Thank you for your work! Regarding the project structure, I would like to know the design purposes of the modeling_eagle and ea_model source files? It appears that both describe the structure of the original model and a single decoder layer. Perhaps modeling_eagle is specifically designed for inference using a custom model?