Figure out how to handle locally hosted models

alan-turing-institute / prompto

An open source library for asynchronous querying of LLM endpoints

https://alan-turing-institute.github.io/prompto/

MIT License

15 stars 1 forks source link

Figure out how to handle locally hosted models #3

Closed rchan26 closed 4 months ago

rchan26 commented 5 months ago

Locally hosted models will perform different and wont necessarily have a strict limit. Maybe in this setting, we send requests sequentially and not async, i.e. utilise the query method of the model class, not async_query.

rchan26 commented 4 months ago

Ollama API and Huggingface TGI APIs are added now as options to run inference on some locally hosted models. We should just only implement async models, so at some point, we ought to remove the sync BaseModel classes as they are currently redundant

rchan26 commented 4 months ago

Closing this as #37 and #47 implement simple Quart endpoints for hosting Huggingface models via transformers.pipeline