Closed sherzod-hakimov closed 5 months ago
Clarify title? In what sense do we want to switch? Have we not committed to running local models directly? (That is, the backend is the transformers object.)
As far as I am aware, there is this independent issue of wanting to provide a (locally running, "free") API for explorative work. But the idea is not (necessarily) for the clembench runs to make use of this service, e.g., to run Llamas. Or is it?
Looks like we've decided on FastChat? Close?
Potential libraries: