A modular and comprehensive solution to deploy a Multi-LLM and Multi-RAG powered chatbot (Amazon Bedrock, Anthropic, HuggingFace, OpenAI, Meta, AI21, Cohere, Mistral) using AWS CDK on AWS
Description of changes:
The internal calls sending chunks (LLM Response Streaming) to the end user were reaching the WAF throttling limit added in #581.
To prevent this, the change excludes the VPC IPs from the throttling limit.
Testing
Check WAF Metrics + requested large output from the model with streaming.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes: The internal calls sending chunks (LLM Response Streaming) to the end user were reaching the WAF throttling limit added in #581.
To prevent this, the change excludes the VPC IPs from the throttling limit.
Testing Check WAF Metrics + requested large output from the model with streaming.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.