[Roadmap] LMFlow Roadmap

This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗

Data
- [ ] DPO dataset format #867
- [ ] Conversation template in DPO
- [ ] jinja template
- [ ] Tools in conversation dataset
Model
- Backend
  - [ ] Accelerate support
Pipeline
- Train/Finetune/Align
  - [ ] DPO (multi-gpu) #867
  - [ ] Iterative DPO #867
  - [ ] PPO
  - [ ] LISA (multi-gpu, qwen2, chatglm)
  - [ ] Batch size and learning rate recommendation (arxiv)
- Inference
  - [x] vllm inference #860 #863
  - [ ] Reward model scoring #867
  - [ ] Multiple instances inference (vllm, rm, others)
  - [ ] Inference checkpointing and resume from checkpoints

[ ] Inference method auto-downgrading (vllm>ds, etc.), and make vllm package optional
[ ] Merging similar model methods into hf_model_mixin

[ ] use_accelerator -> use_accelerate typo fix (with Accelerate support PR)
[ ] model_args.use_lora leads to truncation of the sequence, mentioned in #867
[ ] Make ports, addresses, and all other settings in distributed training tidy and clear (with Accelerate support PR)

OptimalScale / LMFlow