eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.7k stars 200 forks source link

Torch 2.0 compile model #283

Open andrecharneca opened 11 months ago

andrecharneca commented 11 months ago

Are there any plans to add torch.compile speed-ups to LMQL Transformers models? Thanks

lbeurerkellner commented 11 months ago

Hi there Andre, can you recommend any resources on how torch.compile improves inference speed, with e.g. transformers.

In general I am definitely not opposed to adding it.

andrecharneca commented 11 months ago

For example: https://huggingface.co/docs/transformers/main/perf_torch_compile , although this is with Vision Transformers, results should be similar. After some experimentation with torch.compile on my own, for LLMs the compilation can take quite a while, so the gains in performance really depend on the specific use-case. Would be a nice feature to add still, since it's so simple.

lbeurerkellner commented 8 months ago

Marking this as a good first issue.

The feature can be added to https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/backends/transformers_model.py, where an optional lmql serve-model argument can be set, such that compilation is done before model serving begins.