As a user evaluating different LLMs, pricing information is key to understanding relevancy. Some models are very accurate but can be too expensive, or I might be willing to pick a more expensive model if the uplift in accuracy is high enough. Therefore, it would be useful to see both accuracy metrics and pricing on the same report.
For the first release, I suggest:
Allow users to statically assign prices when defining the models in JSON (models_dict in the example ipynb). These prices will be included in the main leaderboard report table.
For Bedrock models, automatically fetch prices from AWS's APIs (Maybe relevant: def price_information in here)
Notes:
Pricing can be token-based (like Bedrock OpenData, or OpenAI API), or it can be based on uptime (Jumpstart endpoints, or Bedrock provisioned throughput).
Pricing differs for input and output tokens. So, it's two values.
For future releases:
Add an effective cost that takes into consideration the price and the size of the test set input and output in tokens.
Add pricing for Jumpstart models or Bedrock provisioned throughput, based on some throughput calculation.
Add Pricing Information to Leaderboard Report
As a user evaluating different LLMs, pricing information is key to understanding relevancy. Some models are very accurate but can be too expensive, or I might be willing to pick a more expensive model if the uplift in accuracy is high enough. Therefore, it would be useful to see both accuracy metrics and pricing on the same report.
For the first release, I suggest:
models_dict
in the example ipynb). These prices will be included in the main leaderboard report table.def price_information
in here)Notes:
For future releases: