Closed davies-w closed 9 months ago
Hi Winton!
For this sample, cost mainly comes from from:
For 1, in this sample we use the Titan Embeddings model through Amazon Bedrock to generate the embeddings with a cost of $0.0001 per 1,000 input tokens (so per roughly 200 words). You can look at a sample document of yours to make an estimate.
For 2, in this sample we use Anthropic Claude 2 to generate a LLM responses which is priced at $0.01102 per 1k input tokens and $0.03268 per 1k output tokens. So in this case, the cost depends on the length of the user question and the length of the response generated by the LLM. You can think of a typical set of questions and answers that you expect in your application and use these for a rough estimate.
You can find the on-demand model pricing for all models in Bedrock here: https://aws.amazon.com/bedrock/pricing/ (discounted pricing is available via Provisioned Throughput)
There are other infrastructure costs involved as well but this should be the two main cost factors.
Anthropic just announced an improve Claude version 2.1 which also comes with reduced pricing. I expect this to be available through Amazon Bedrock soon as well.
Also, take a look at this more comprehensive sample from us that offers more options in terms of model choice, vector databases, and more: https://github.com/aws-samples/aws-genai-llm-chatbot
Let me know if this answers your question.
Hi,
We're just prototyping something right now, and want to move our RAG approach into the cloud. One motivation is cost, the other is speed, with OpenAI's RAG assistants forcing the use of GPT-4 rather than 3.5, and the latency being astoundingly bad.
Could you give a rough estimate of how much this would per query? We're still paying OpenAI for their tokens (at 3.5 rates of course).
Also, a sense of latency for retrieving 10 or so paragraphs from a 100 PDFs?
Thanks in advance, W