Turning off the GPU should signigicantly slow down the answering. But it doesn't:
library(rollama)
res <- bench::mark(
cpu = {query("why is the sky blue?",
model_params = list(num_gpu = 0))},
gpu = query("why is the sky blue?"),
check = FALSE
)
summary(res)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cpu 4.9s 4.9s 0.204 4.98MB 0
#> 2 gpu 6.09s 6.09s 0.164 545.33KB 0.164
Turning off the GPU should signigicantly slow down the answering. But it doesn't:
Created on 2024-01-23 with reprex v2.0.2
Using
The time ollama takes goes up to 50s, which makes more sense. I assume parameters are not translated to JSON correctly.