Roadmap - Githubissues

This is a tentative roadmap for major improvements to fast LLM. It includes big features and potential breaking changes, but excludes minor features and additions.

It goes in several parts, with the following milestones: v0.1 (2024-10-11): First open-source version v0.2 (~Q4 2024): Follow-up to address technical debt on checkpoints and configs, with several breaking changes held up to this point. v0.3 (~Q1 2025): Further generalization to enable other models, ex. multimodal, with limited breaking changes

Config and checkpoints (v0.2)

Structured configuration (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/304, https://github.com/ServiceNow/Fast-LLM-Internal/pull/308, https://github.com/ServiceNow/Fast-LLM-Internal/pull/315, https://github.com/ServiceNow/Fast-LLM-Internal/pull/316)

Replace the flat argparse format with a nested one using a . separator, allow configuring from yaml file.

Rename config parameters (partially done #1, #6, etc.)

With structured config, many parameter name become redundant, ex: pretrained.pretrained_checkpoint_path can be unambiguously simplified to pretrained.checkpoint_path. This is also a good opportunity to clean things up, make names core consistent, etc.

Checkpoint improvements (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/308 #6 #18 #22)

We need to do some breaking changes on the checkpoints:

Save the model config (metadata) in nested format rather than flat (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/308)
Option to set the export format (done #6)
Generalize checkpointing formats, make it modular (done #18 #22)
Review the distributed checkpoint format: save the shard breakdown in the metadata?, drop safetensors? (proposal #26?), enforce strict version match for loading?, etc.
Review checkpoint metadata: Use better versioning, save more info, etc.

Config metadata (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/291)

Add metadata to config parameters to get more out of them (description, documentation, type flag, validation constraint (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/291)
Use these to generate cli --help
Use these to generate documentation. We should consider some automated API doc tool (sphinx?) so we can have better linking and highlighting.

Open-sourcing follow-up

Documentation, publications, benchmarks, etc.

Model generalization (v0.3)

Enable custom models and trainers (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)

Make it possible to define custom model, data and trainer class
Make a registry so these new models can be used through the cli.
Make a template for custom models and trainers.

Generalize data #25

The data class is currently hard-coded to a gpt dataset. We need to make it more easily adaptable to other data formats and data loading schemes.

Implement a non-trivial second model example

We want to demonstrate generalizability with another model, ex. by wrapping a pytorch and/or huggingface model as a (poorly optimized) Fast-LLM model.

Generalize/rethink batch config and schedule #115

These expect a single input of shape (batch, sequence). We'll want to be more flexible.

Generalize trainer and metric logging #115

There is already a generic trainer class, but it still have some non-generic components, especially when it comes to logging.

Developer documentation for adding a new model/feature (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)

Long-term features (v0.4+)

These would be great additions, but are not yet on a clear roadmap.

Document Fast-LLM best practices for performance

Implement staged training

Generalize optimizer

We are mostly hard-coded to Adam.

Generalize `Schedule`

Triton optimizer

A triton implementation of multi-tensor Adam would give a small performance boost, avoid some explicit cpu-gpu synchronizations and remove our dependence on third-party kernels. The multi-tensor part could be challenging.

Optimize for inference

KV cache pre-allocation
CPU optimizations / CUDA graphs?

Support non-nvidia GPUs

Blockers: distributed (nccl), apex optimizer, flash attn? Triton and pytorch kernels should be OK but need to verify

Technical debt (v0.x)

These will eventually cause trouble but aren't urgent yet, they are indicated here to indicate what is likely to change in the future.

Rework logging

Rework `Distributed`

Generalize, make distributed dims flexible
Separate non-distributed stuff (random state, data types)

Rework `Run` (partially done #1)

We probably want to get rid of most of it.

Factor out `core`

This legacy module doesn't really make sense anymore

Refactor `functional`

The distinction between triton and others is no longer relevant.

Rethink the model `input_`, `kwargs`

The unstructured format of the model input (ex. Layer.forward(self, input_: torch.Tensor, kwargs: dict, ...) is already confusing and error-prone, and things will keep getting worse. We'll want to add more structure.

ServiceNow / Fast-LLM

Roadmap #27

Config and checkpoints (v0.2)

Structured configuration (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/304, https://github.com/ServiceNow/Fast-LLM-Internal/pull/308, https://github.com/ServiceNow/Fast-LLM-Internal/pull/315, https://github.com/ServiceNow/Fast-LLM-Internal/pull/316)

Rename config parameters (partially done #1, #6, etc.)

Checkpoint improvements (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/308 #6 #18 #22)

Config metadata (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/291)

Open-sourcing follow-up

Model generalization (v0.3)

Enable custom models and trainers (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)

Generalize data #25

Implement a non-trivial second model example

Generalize/rethink batch config and schedule #115

Generalize trainer and metric logging #115

Developer documentation for adding a new model/feature (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)

Long-term features (v0.4+)

Document Fast-LLM best practices for performance

Implement staged training

Generalize optimizer

Generalize `Schedule`

Triton optimizer

Optimize for inference

Support non-nvidia GPUs

Technical debt (v0.x)

Rework logging

Rework `Distributed`

Rework `Run` (partially done #1)

Factor out `core`

Refactor `functional`

Rethink the model `input_`, `kwargs`

ServiceNow / Fast-LLM

Roadmap #27

Config and checkpoints (v0.2)

Structured configuration (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/304, https://github.com/ServiceNow/Fast-LLM-Internal/pull/308, https://github.com/ServiceNow/Fast-LLM-Internal/pull/315, https://github.com/ServiceNow/Fast-LLM-Internal/pull/316)

Rename config parameters (partially done #1, #6, etc.)

Checkpoint improvements (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/308 #6 #18 #22)

Config metadata (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/291)

Open-sourcing follow-up

Model generalization (v0.3)

Enable custom models and trainers (done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)

Generalize data #25

Implement a non-trivial second model example

Generalize/rethink batch config and schedule #115

Generalize trainer and metric logging #115

Developer documentation for adding a new model/feature (partially done https://github.com/ServiceNow/Fast-LLM-Internal/pull/319)

Long-term features (v0.4+)

Document Fast-LLM best practices for performance

Implement staged training

Generalize optimizer

Generalize Schedule

Triton optimizer

Optimize for inference

Support non-nvidia GPUs

Technical debt (v0.x)

Rework logging

Rework Distributed

Rework Run (partially done #1)

Factor out core

Refactor functional

Rethink the model input_, kwargs

Generalize `Schedule`

Rework `Distributed`

Rework `Run` (partially done #1)

Factor out `core`

Refactor `functional`

Rethink the model `input_`, `kwargs`