This is a tentative roadmap for major improvements to fast LLM. It includes big features and potential breaking changes, but excludes minor features and additions.
It goes in several parts, with the following milestones:
v0.1 (2024-10-11): First open-source version
v0.2 (~Q4 2024): Follow-up to address technical debt on checkpoints and configs, with several breaking changes held up to this point.
v0.3 (~Q1 2025): Further generalization to enable other models, ex. multimodal, with limited breaking changes
With structured config, many parameter name become redundant, ex: pretrained.pretrained_checkpoint_path can be unambiguously simplified to pretrained.checkpoint_path. This is also a good opportunity to clean things up, make names core consistent, etc.
Generalize checkpointing formats, make it modular (done #18 #22)
Review the distributed checkpoint format: save the shard breakdown in the metadata?, drop safetensors? (proposal #26?), enforce strict version match for loading?, etc.
Review checkpoint metadata: Use better versioning, save more info, etc.
These would be great additions, but are not yet on a clear roadmap.
Document Fast-LLM best practices for performance
Implement staged training
Generalize optimizer
We are mostly hard-coded to Adam.
Generalize Schedule
Triton optimizer
A triton implementation of multi-tensor Adam would give a small performance boost, avoid some explicit cpu-gpu synchronizations and remove our dependence on third-party kernels. The multi-tensor part could be challenging.
Optimize for inference
KV cache pre-allocation
CPU optimizations / CUDA graphs?
Support non-nvidia GPUs
Blockers: distributed (nccl), apex optimizer, flash attn?
Triton and pytorch kernels should be OK but need to verify
Technical debt (v0.x)
These will eventually cause trouble but aren't urgent yet, they are indicated here to indicate what is likely to change in the future.
Rework logging
Rework Distributed
Generalize, make distributed dims flexible
Separate non-distributed stuff (random state, data types)
Rework Run (partially done #1)
We probably want to get rid of most of it.
Factor out core
This legacy module doesn't really make sense anymore
Refactor functional
The distinction between triton and others is no longer relevant.
Rethink the model input_, kwargs
The unstructured format of the model input (ex. Layer.forward(self, input_: torch.Tensor, kwargs: dict, ...) is already confusing and error-prone, and things will keep getting worse. We'll want to add more structure.
This is a tentative roadmap for major improvements to fast LLM. It includes big features and potential breaking changes, but excludes minor features and additions.
It goes in several parts, with the following milestones: v0.1 (2024-10-11): First open-source version v0.2 (~Q4 2024): Follow-up to address technical debt on checkpoints and configs, with several breaking changes held up to this point. v0.3 (~Q1 2025): Further generalization to enable other models, ex. multimodal, with limited breaking changes
Config and checkpoints (v0.2)
Structured configuration (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/304, https://github.com/ServiceNow/Fast-LLM-Internal/pull/308, https://github.com/ServiceNow/Fast-LLM-Internal/pull/315, https://github.com/ServiceNow/Fast-LLM-Internal/pull/316)
Replace the flat argparse format with a nested one using a
.
separator, allow configuring from yaml file.Rename config parameters (partially done #1, #6, etc.)
With structured config, many parameter name become redundant, ex:
pretrained.pretrained_checkpoint_path
can be unambiguously simplified topretrained.checkpoint_path
. This is also a good opportunity to clean things up, make names core consistent, etc.Checkpoint improvements (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/308 #6 #18 #22)
We need to do some breaking changes on the checkpoints:
Config metadata (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/291)
--help
Open-sourcing follow-up
Documentation, publications, benchmarks, etc.
Model generalization (v0.3)
Enable custom models and trainers (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)
Generalize data #25
The data class is currently hard-coded to a gpt dataset. We need to make it more easily adaptable to other data formats and data loading schemes.
Implement a non-trivial second model example
We want to demonstrate generalizability with another model, ex. by wrapping a pytorch and/or huggingface model as a (poorly optimized) Fast-LLM model.
Generalize/rethink batch config and schedule #115
Generalize trainer and metric logging #115
There is already a generic trainer class, but it still have some non-generic components, especially when it comes to logging.
Developer documentation for adding a new model/feature (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)
Long-term features (v0.4+)
These would be great additions, but are not yet on a clear roadmap.
Document Fast-LLM best practices for performance
Implement staged training
Generalize optimizer
We are mostly hard-coded to Adam.
Generalize
Schedule
Triton optimizer
A triton implementation of multi-tensor Adam would give a small performance boost, avoid some explicit cpu-gpu synchronizations and remove our dependence on third-party kernels. The multi-tensor part could be challenging.
Optimize for inference
Support non-nvidia GPUs
Blockers: distributed (nccl), apex optimizer, flash attn? Triton and pytorch kernels should be OK but need to verify
Technical debt (v0.x)
These will eventually cause trouble but aren't urgent yet, they are indicated here to indicate what is likely to change in the future.
Rework logging
Rework
Distributed
Rework
Run
(partially done #1)We probably want to get rid of most of it.
Factor out
core
This legacy module doesn't really make sense anymore
Refactor
functional
The distinction between triton and others is no longer relevant.
Rethink the model
input_
,kwargs
The unstructured format of the model input (ex.
Layer.forward(self, input_: torch.Tensor, kwargs: dict
, ...) is already confusing and error-prone, and things will keep getting worse. We'll want to add more structure.