defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
485 stars 52 forks source link

Standardize prompt formatting for vllm #140

Closed wongjingping closed 2 months ago

wongjingping commented 2 months ago

Standardize prompt formatting in vllm by tokenizing the prompt completely first, and checking if the bos token is already at the start and then adding it if not. This avoids adding the bos token twice (or not at all). Also learnt that vllm allows us to pre-tokenize and pass in the token_id's instead of the string as it allows us to control special tokens more consistently from our side. Tested on a bunch of checkpoints and it generally does equal or slightly better.