When serving models / LLMs, latency is often very important but can be difficult to attribute, especially in complicated systems with multiple steps and systems.
Logs can be super helpful for troubleshooting, but collating logs across multiple systems isn't very easy and logs aren't easily consumed by humans. Tracing is often seen as a better alternative as it gives gantt charts out of the box, even with multiple systems with tools like Jaeger.
This issue is to track showing how to implement tracing for Databricks ML serving applications.
When serving models / LLMs, latency is often very important but can be difficult to attribute, especially in complicated systems with multiple steps and systems.
Logs can be super helpful for troubleshooting, but collating logs across multiple systems isn't very easy and logs aren't easily consumed by humans. Tracing is often seen as a better alternative as it gives gantt charts out of the box, even with multiple systems with tools like Jaeger.
This issue is to track showing how to implement tracing for Databricks ML serving applications.