how auto-regressive large language models generate text one token at a time
KV caching
continuous batching
model quantization
Pre-requisites & reading material
Basic python to write non trivial tasks
Knowledge of transformers working will be helpful.
Time required for the talk
40 min
Link to slides/demos
No response
About you
Ankush Chander is a Research Engineer with focus on Natural Language Processing and Information Retrieval. He completed his Masters degree from DA-IICT, Gandhinagar, Gujarat, India in 2016. Post that he co-founded RAx (now Enago Read) where he also worked as Research Engineer from 2016 to 2023. Previously he has worked as a Web Developer in MothersonSumi Infotech & Design Ltd. He is also an Open source enthusiast and has contributed to projects like Pytextrank, Argilla, kglab.
Title
Efficiently Serving LLMs
Describe your Talk
This talk will cover :
Pre-requisites & reading material
Time required for the talk
40 min
Link to slides/demos
No response
About you
Ankush Chander is a Research Engineer with focus on Natural Language Processing and Information Retrieval. He completed his Masters degree from DA-IICT, Gandhinagar, Gujarat, India in 2016. Post that he co-founded RAx (now Enago Read) where he also worked as Research Engineer from 2016 to 2023. Previously he has worked as a Web Developer in MothersonSumi Infotech & Design Ltd. He is also an Open source enthusiast and has contributed to projects like Pytextrank, Argilla, kglab.
Availability
18/05/2024
Any comments
No response