Efficiently Serving LLMs

Title

Describe your Talk

This talk will cover :

how auto-regressive large language models generate text one token at a time
KV caching
continuous batching
model quantization

Pre-requisites & reading material

Basic python to write non trivial tasks
Knowledge of transformers working will be helpful.

Time required for the talk

40 min

Link to slides/demos

No response

About you

Ankush Chander is a Research Engineer with focus on Natural Language Processing and Information Retrieval. He completed his Masters degree from DA-IICT, Gandhinagar, Gujarat, India in 2016. Post that he co-founded RAx (now Enago Read) where he also worked as Research Engineer from 2016 to 2023. Previously he has worked as a Web Developer in MothersonSumi Infotech & Design Ltd. He is also an Open source enthusiast and has contributed to projects like Pytextrank, Argilla, kglab.

Availability

18/05/2024

Any comments

No response

Gandhinagar-ML-NLP-Group / talks