Hello, I have some jobs that have zero time constraints but are cost sensitive so wondering how to integrate the batch endpoint(s) from OpenAI since there is already a lot of async waiting for model output maybe this could make sense for some if not all requests ?
Hello, I have some jobs that have zero time constraints but are cost sensitive so wondering how to integrate the batch endpoint(s) from OpenAI since there is already a lot of async waiting for model output maybe this could make sense for some if not all requests ?
Batch API -- Higher rate limits & 50% discounted Tokens
Edit: more work but again in the cost optimization bracket could be integrating the new Cashing endpoints ?