defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
540 stars 57 forks source link

Api server updates #141

Closed wongjingping closed 4 months ago

wongjingping commented 4 months ago

Problem: we were implicitly adding another bos token inside the vllm engine (in addition to the one in the prompt). This caused double bos tokens for our prompts during evaluation and a slight degradation.

Solution: Modify our api_server.py to ensure that we don't create double bos tokens by not adding special tokens (bos/eos) during the token encoding and only add the bos if it's missing after. We preferred modifying it in the api_server.py to avoid having to specify the tokenizer for api server runs and customizing the prompt based on the model/tokenizer revision.

Alternative considered: Modify api runner to pass in the prompt_token_ids directly. However, this would not have broader compatibility with the open ai server formats, or with the vanilla api server from vllm, since those only expect a prompt of type string and not token ids.