Speed up Model Inference

SulRash / envenc

Repository for environment encoder, an attempt at improving reinforcement learning agents' generalisability through learning how to act on universal multimodal embeddings generated by a vision-language model.

MIT License

2 stars 0 forks source link

Speed up Model Inference #2

Closed SulRash closed 5 months ago

SulRash commented 6 months ago

Right now we're just using hugging face's normal inferencing procedure alongside flash attention-2, this is wholly inefficient compared to what we could achieve.

Integrating a framework like vLLM would be nice, but sadly they don't support getting a model's hidden_state. Have to research around.

SulRash commented 5 months ago

Sped up model inference through caching and batching to more than double