When ESBMC output is too big, a TPM Rate Limit error occurs.

Yiannis128 commented 1 year ago

When ESBMC has an output (counterexample) that is too big, then the token size is too large to be passed to the LLM. Currently, due to switching to LangChain, we don't measure and check if the current tokens have been surpassed. When the error occurs, LangChain will output it, please see Example.

Example (From the FormAI dataset FormAI_92991.c):

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-3.5-turbo-16k in organization on tokens per min (TPM): Limit 180000, Used 150144, Requested 53468. Please try again in 7.87s. Visit https://platform.openai.com/account/rate-limits to learn more..

ishaan-jaff commented 12 months ago

@Yiannis128 i'm the maintainer of LiteLLM we allow you to increase your throughput - load balance between multiple deployments (Azure, OpenAI) I'd love your feedback, especially if this does not solve your problem

Here's how to use it Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Yiannis128 commented 12 months ago

@Yiannis128 i'm the maintainer of LiteLLM we allow you to increase your throughput - load balance between multiple deployments (Azure, OpenAI) I'd love your feedback, especially if this does not solve your problem

Here's how to use it Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Hi, thanks for the suggestion, before I look at this, I would like to ask if you have a hugging face model uploaded? Since I already have hugging face model support.

I will still look at it if you don't but if you do it will be much easier to implement.

ishaan-jaff commented 12 months ago

Yes we support hugging face llms - are you trying to load balance between hugging face endpoints ?

Yiannis128 commented 12 months ago

Yes we support hugging face llms - are you trying to load balance between hugging face endpoints ?

No, I only ask it because I have an interface for adding text-generation-inference compatible models through hugging face. So if you do, which is great! Is this an alternative to langchain? Could you inform me of the positives you have?

Langchain had some limitations when I implemented it, not sure about now, so I'm weighing cost/benefit of considering to switch :)

esbmc / esbmc-ai

When ESBMC output is too big, a TPM Rate Limit error occurs. #93