foundation-model-stack / fms-extras

Apache License 2.0
20 stars 9 forks source link

fms-extras

This is a repo as part of the foundation-model-stack organization which is used for new features staged to be integrated with foundation-model-stack. This repo is the home for extensions, research and/or in-development work, and fms-based models trained by IBM.

Installation

Local

pip install -e .

Notable Features

  1. MLPSpeculator: a lightweight speculator model that can be used along-side a generative model to speed up inference (currently deployed in IBM TGIS with training in fms-fsdp)
  2. PagedKVCacheManager: an implementation of kv-cache management that provides a user with the proper input to use paged-attention with their own models (currently deployed in IBM TGIS)
  3. PagedLLaMA: a LLaMA implementation that uses paged-attention in Multi-Head Attention. This model is compilable without graph breaks.
  4. speculative generation: a reference implementation of speculative generate using PagedKVCacheManager and MLPSpeculator

Structure and contents of this Repository

This repo follows a similar structure to that of foundation-model-stack

References