b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

[New Feature] Add new route for dllama api for embeding models #96

Open testing0mon21 opened 3 days ago

testing0mon21 commented 3 days ago
std::vector<Route> routes = {
    {
        "/v1/chat/completions",
        HttpMethod::METHOD_POST,
        std::bind(&handleCompletionsRequest, std::placeholders::_1, &api)
    },
    {
        "/v1/models",
        HttpMethod::METHOD_GET,
        std::bind(&handleModelsRequest, std::placeholders::_1)
    }
};

in ddlama api at master branch we have only 2 routes /v1/chat/completions and /v1/models but some model looks like llama3:8b has embedding functionality. Can add you add new route for /api/embeddings ?
testing0mon21 commented 3 days ago

@b4rtaz what do you think about new route?

testing0mon21 commented 1 day ago

i mean this for ddlama api

Generate Embeddings

POST /api/embeddings

Generate embeddings from a model

testing0mon21 commented 8 hours ago

i hope that i explain) @b4rtaz