astramind-ai / BitMat

An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
Apache License 2.0
153 stars 9 forks source link

Create a Triton backend #2

Closed EwoutH closed 5 months ago

EwoutH commented 5 months ago

Would it be possible to create a Triton backend from this implementation?

A Triton backend is the implementation that executes a model. A backend can be a wrapper around a deep-learning framework, like PyTorch, TensorFlow, TensorRT or ONNX Runtime. Or a backend can be custom C/C++ logic performing any operation (for example, image pre-processing).

mlinmg commented 5 months ago

I don't have direct experience with Triton-Inference-Server, I'll look into it in the nex days

michaelfeil commented 5 months ago

@EwoutH I think you confused OpenAI Triton (the Language) with Nvidia Triton (a API server in C++)

EwoutH commented 5 months ago

Right, from the Readme I didn’t figure that. Thanks for clearing that up!