aws-samples / serverless-pdf-chat

LLM-powered document chat using Amazon Bedrock and AWS Serverless
https://aws.amazon.com/blogs/compute/building-a-serverless-document-chat-with-aws-lambda-and-amazon-bedrock/
MIT No Attribution
228 stars 206 forks source link

Cost? #27

Closed davies-w closed 9 months ago

davies-w commented 9 months ago

Hi,

We're just prototyping something right now, and want to move our RAG approach into the cloud. One motivation is cost, the other is speed, with OpenAI's RAG assistants forcing the use of GPT-4 rather than 3.5, and the latency being astoundingly bad.

Could you give a rough estimate of how much this would per query? We're still paying OpenAI for their tokens (at 3.5 rates of course).

Also, a sense of latency for retrieving 10 or so paragraphs from a 100 PDFs?

Thanks in advance, W

pbv0 commented 9 months ago

Hi Winton!

For this sample, cost mainly comes from from:

  1. Generating embeddings for the documents and the user queries using an embedding model.
  2. Inference cost for the LLM itself.

For 1, in this sample we use the Titan Embeddings model through Amazon Bedrock to generate the embeddings with a cost of $0.0001 per 1,000 input tokens (so per roughly 200 words). You can look at a sample document of yours to make an estimate.

For 2, in this sample we use Anthropic Claude 2 to generate a LLM responses which is priced at $0.01102 per 1k input tokens and $0.03268 per 1k output tokens. So in this case, the cost depends on the length of the user question and the length of the response generated by the LLM. You can think of a typical set of questions and answers that you expect in your application and use these for a rough estimate.

You can find the on-demand model pricing for all models in Bedrock here: https://aws.amazon.com/bedrock/pricing/ (discounted pricing is available via Provisioned Throughput)

There are other infrastructure costs involved as well but this should be the two main cost factors.

Anthropic just announced an improve Claude version 2.1 which also comes with reduced pricing. I expect this to be available through Amazon Bedrock soon as well.

Also, take a look at this more comprehensive sample from us that offers more options in terms of model choice, vector databases, and more: https://github.com/aws-samples/aws-genai-llm-chatbot

Let me know if this answers your question.