codelion / optillm

Optimizing inference proxy for LLMs
Apache License 2.0
1.6k stars 128 forks source link

Setting the default approach doesn't work #69

Closed ErykCh closed 2 weeks ago

ErykCh commented 1 month ago

Hi,

my configuration for mcts is as follows:

in docker-compose.yml

services:
  optillm:
  container_name: optillm-proxy
  image: optillm:1.0
  ports:
  - 81:8000
  restart: unless-stopped
  environment:
  - OPENAI_API_KEY=no_key
  command: --log debug --approach mcts --n 5 --simulations 3 --depth 3 --return-full-response true --base-url http://vllm:8000/v1

I would expect that if I do not specify slug in the model name or in the promt, mcts will be selected, because I have: --approach mcts

Starting logs: 2024-10-21 10:09:24,593 - INFO - Starting server with approach: mcts 2024-10-21 10:09:24,594 - INFO - Server configuration: {'approach': 'mcts', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'n': 5, 'base_url': 'http://vllm:8000/v1', 'optillm_api_key': '', 'return_full_response': True, 'port': 8000, 'log': 'debug', 'simulations': 3, 'exploration': 0.2, 'depth': 3}

Prompt:

2024-10-21 10:09:44,412 - INFO - Received request to /v1/chat/completions 2024-10-21 10:09:44,412 - DEBUG - Intercepted Bearer Token: my_key 2024-10-21 10:09:44,412 - DEBUG - Request data: {'model': '', 'user': '66f282757cce4320b5e6bfa1', 'stream': True, 'messages': [{'role': 'user', 'content': ''}]} 2024-10-21 10:09:44,472 - INFO - Using approach(es) ['bon'], operation SINGLE, with model

but should use mcts

codelion commented 1 month ago

Thanks for checking this out, it is fixed in https://github.com/codelion/optillm/commit/83b53419dcdd38c87240d2fbcc399f1bcc500f09

ErykCh commented 1 month ago

But now it sets it permanently.

Setting --approach mcts and adding in the prompt:

moa

causes mcts to be called

ErykCh commented 1 month ago

Even worse

removing --approach mcts and adding in the prompt:

moa

result in error:

ERROR - Error processing request: Error code: 404 - {'object': 'error', 'message': 'The model auto-XYZ does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}

codelion commented 1 month ago

Thanks for checking again. fixed it in https://github.com/codelion/optillm/commit/94fad7846e82cd24f4603a4da7019ba242f40be3

The order of preference for the approach is as follows:

ErykCh commented 1 month ago

Ok I now understand doc:

Please note that the convention described above works only when the optillm server has been started with inference approach set to auto. Otherwise, the model attribute in the client request must be set with the model name only.

But now I don't understand how auto is working.

From my quick tests, it always triggers a bon. How is the logic triggered to choose the best method to choose for a given question? Because it should probably be a query to the LLM first to determine which method to choose for that question?

codelion commented 1 month ago

The auto just means that the approach has to be set by the user either in the model name or request extra body or messages. If it is still not set then it defaults to bon.

How is the logic triggered to choose the best method to choose for a given question?

This is implemented in the router plugin - https://github.com/codelion/optillm/blob/main/optillm/plugins/router_plugin.py Use the router-model-name to use it. It uses a bert-style classiifer model to route to the appropriate approach. This is the model that is used - https://huggingface.co/codelion/optillm-bert-uncased

It was trained using the data generated from Arena Auto Hard using the script - https://github.com/codelion/optillm/blob/main/scripts/gen_optillm_dataset.py The code for training is here https://github.com/codelion/optillm/blob/main/scripts/train_optillm_classifier.py