Made the number of VCPUs per ECS instance configurable
Made the number of ECS instances configurable
Added a "Benchmark mode", which if enabled, creates a fake bedrock server that just waits 4 seconds before returning. Used to help with benchmark testing, because Bedrock does not have high enough rate limits to test high loads in most accounts
Added autoscaling based on CPU % usage to ECS
Add more allocated threads to FastAPI, as the default caused performance issues and CPU underutilization
Add caching of auth, api key name, default model access, default quota access
fix some caching bugs
Add more connections to dynamoDb and Bedrock client to be able to handle the required requests per second
Bring synchronous Bedrock usage up to feature parity with streaming bedrock usage