use n_threads param to call _embed_image_bytes fun

Hello, when I run openbmb/MiniCPM-V-2_6-gguf model, I find llama-cpp-python as a server is slower than llama_cpp's example of minicpmv-cli. I find the diffrerence is llama-cpp-python's _embed_image_bytes func is called with param of n_threads_batch. But llama-cpp's example of minicpmv-cli use n_threads (which value is cpu_cores / 2), when call llava_image_embed_make_with_bytes func. The param n_threads make image process more efficient and less time-consuming.

For example, on my CPU (56 cores), it takes more than three times the time.

This parameter affects the time consumption function as follows:

bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_image_f32_batch * imgs, float * vec) {
    ggml_backend_graph_compute(ctx->backend, gf);
......
}

Best wishes.

abetlen / llama-cpp-python

use n_threads param to call _embed_image_bytes fun #1834