Gandhinagar-ML-NLP-Group / talks

Talks at Gandhinagar Machine Learning and NLP Group
1 stars 0 forks source link

Efficiently Serving LLMs #12

Closed Ankush-Chander closed 1 week ago

Ankush-Chander commented 3 weeks ago

Title

Efficiently Serving LLMs

Describe your Talk

This talk will cover :

  1. how auto-regressive large language models generate text one token at a time
  2. KV caching
  3. continuous batching
  4. model quantization

Pre-requisites & reading material

  1. Basic python to write non trivial tasks
  2. Knowledge of transformers working will be helpful.

Time required for the talk

40 min

Link to slides/demos

No response

About you

Ankush Chander is a Research Engineer with focus on Natural Language Processing and Information Retrieval. He completed his Masters degree from DA-IICT, Gandhinagar, Gujarat, India in 2016. Post that he co-founded RAx (now Enago Read) where he also worked as Research Engineer from 2016 to 2023. Previously he has worked as a Web Developer in MothersonSumi Infotech & Design Ltd. He is also an Open source enthusiast and has contributed to projects like Pytextrank, Argilla, kglab.

Availability

18/05/2024

Any comments

No response

Ankush-Chander commented 3 weeks ago

Scheduled for 18 May 2024