OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.11k stars 819 forks source link

[Roadmap] LMFlow Roadmap #862

Open wheresmyhair opened 1 week ago

wheresmyhair commented 1 week ago

This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗

Main Features

Usability

Bug fixes

Issues left over from history

Documentation

wheresmyhair commented 4 days ago

Note on multiple instances inference: In vllm inference, the number of attn heads should be devisible by vllm tensor parallel size. If we have a 14 heads LLM, then the options for tp is 1&2 (7 will cause another division issue, but I just forget what that issue is). Say we have 8 gpus, then to utilize these devices, multiple instances vllm inference is necessary (tp=1 -> 8 instances, and tp=2 -> 4 instances) Also, same for rm inference, and any other inference pipelines.