This is a WIP PR that proposes the following changes:
[x] Refactor lmwrapper.BACKEND_wrapper pattern to lmwrapper.BACKEND.wrapper. This means each backend and its code lives in its own module/folder. By defining imports in __init__.py, users can do from lmwrapper.huggingface import get_huggingface_lm or from lmwrapper.openai.get_openai_lm
[x] Add vLLM backend
[x] Add ExLlama backend
[x] Introduce HuggingFaceModelInfo -> this may be bad design choice and happy to drop this
[ ] ~Rename is_chat to is_dialog. This is probably unnecessary but I think dialog is clearer that the model/interaction pattern expects special prompt formatting, whereas chat is more vague. Happy to drop this.~
[x] lmwrapper/huggingface/stopping_criteria.py can probably be removed.
In a future PR
[ ] Remove concept of a Runtime. The pattern is not flexible enough.
[ ] Remove the never finished ONNX/TensorRT runtimes.
[ ] Get TinyLLama working for tests using ExLlama.
Notes
ExLLama is untested and unused so far.
vLLM only works on CUDA. Therefore I'm not able to test it on CI/locally.
This is a WIP PR that proposes the following changes:
lmwrapper.BACKEND_wrapper
pattern tolmwrapper.BACKEND.wrapper
. This means each backend and its code lives in its own module/folder. By defining imports in__init__.py
, users can dofrom lmwrapper.huggingface import get_huggingface_lm
orfrom lmwrapper.openai.get_openai_lm
vLLM
backendExLlama
backendHuggingFaceModelInfo
-> this may be bad design choice and happy to drop thisis_chat
tois_dialog
. This is probably unnecessary but I think dialog is clearer that the model/interaction pattern expects special prompt formatting, whereas chat is more vague. Happy to drop this.~lmwrapper/huggingface/stopping_criteria.py
can probably be removed.In a future PR
Runtime
. The pattern is not flexible enough.Notes