bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.17k stars 792 forks source link

Support replaying prediction traffic logs #1055

Closed parano closed 3 years ago

parano commented 4 years ago

Is your feature request related to a problem? Please describe.

When a new version of the model is produced, it is a common practice to compare its performance & behavior against the previous version of the model. BentoML produces prediction logs when API server is serving production traffic. Each prediction log record contains both the inference request input data and the inference result. If data scientist can easily utilize the prediction logs and apply them to a BentoService from the development or CI environment, it can allow them to get feedback and compare differences more efficiently.

See related discussion on online shadow deployment: https://github.com/bentoml/BentoML/discussions/1051

Describe the solution you'd like

from bentoml import prediction_log_replayer
# for replay against an API server
prediction_log_replayer.replay(log_file, log_directory, url, max_concurrent_requests)
# for replay locally against a loaded BentoService instance
prediction_log_replayer.replay(log_file, log_directory, bento_service, batch_size)

Describe alternatives you've considered Suggestions are welcomed!

Additional context This is dependent on the Input/Output Adapter Refactoring #1002

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

parano commented 3 years ago

This needs further investigation - closing for now