JBGruber / rollama

https://jbgruber.github.io/rollama/
GNU General Public License v3.0
90 stars 2 forks source link

Add load balancing #17

Open JBGruber opened 5 months ago

JBGruber commented 5 months ago

The same approach implemented in #16 could also be used to send requests to multiple Ollama servers at once to process requests in parallel. There are at least two approaches we could follow:

  1. naive: we distribute requests equally among servers and wait for all responses.
  2. advanced: we send a few requests to each server and then poll which instance has returned responses. As soon as a server has fewer than x open requests in the queue, we send more.

In 1., the total run time would be determined by the slowest instance. 2. would be much more efficient in scenarios with a mix of fast and slow machines, but also harder to implement.

JBGruber commented 1 week ago

This works now in the output branch. I opted to do something between naive and advanced. When you supply a vector of servers, you can assign a name to each, corresponding to what share of requests should be fulfilled by that server. So c("0.6" = "http://localhost:11434/", "0.4" = "http://192.168.2.45:11434/") will hand 60% of requests to localhost and 40% to the remote computer. It's pretty quick:

library(rollama)
library(tidyverse)

reviews_df <- read_csv("https://raw.githubusercontent.com/AFAgarap/ecommerce-reviews-analysis/master/Womens%20Clothing%20E-Commerce%20Reviews.csv",
                       show_col_types = FALSE) |> 
  sample_n(500)
#> New names:
#> • `` -> `...1`

make_query <- function(t) {
  tribble(
    ~role,    ~content,
    "system", "You assign texts into categories. Answer with just the correct category, which is either {positive}, {neutral} or {negative}.",
    "user", t
  )
}

start <- Sys.time()
reviews_df_annotated <- reviews_df |> 
  mutate(query = map(`Review Text`, make_query),
         category = query(query, screen = FALSE,
                          model = "llama3.2:3b-instruct-q8_0", 
                          server = c("0.6" = "http://localhost:11434/", 
                                     "0.4" = "http://192.168.2.45:11434/"), 
                          output = "text"))
stop <- Sys.time()
stop - start
#> Time difference of 18.19546 secs

Created on 2024-10-18 with reprex v2.1.0