- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Expected/desired behavior
token based rate limiting should use new <azure-openai-token-limit> policy
usage reporting should use new <azure-openai-emit-token-metric> policy
load balancing should use new backend pool options, where applicable (simple)
This issue is for a: (mark with an
x
)Expected/desired behavior
<azure-openai-token-limit>
policy<azure-openai-emit-token-metric>
policyMention any other details that might be useful