ExpediaDotCom / haystack

Top level repository for Haystack, containing documentation and deployment scripts
http://expediadotcom.github.io/haystack
Apache License 2.0
306 stars 44 forks source link

[Feature Request] Log number of bytes sent and received #785

Open smyrick opened 5 years ago

smyrick commented 5 years ago

The spans include the start time and duration for each call. Developers could add a custom field to the log to include the bytes sent but that would require having every service using haystack to update their logs to a uniform field. Instead it would be great if this could be standardized in Haystack

Our team thinks that this would be an amazing feature to start analyzing "data loss" in our tech stack and point out spots where there was a large percentage of loss, which may be a signal that the APIs can be trimmed down or better optimized for the clients using them.

NOTE: This is not counting when you can log the blob of the entire response but just the byte count which should be scalable to add to every span

Example

Screen Shot 2019-10-04 at 9 42 13 AM

jamesgust commented 5 years ago

The ability to capture and visualize this information in Haystack would be enormously powerful.

Some background:

We've designed the BEX-API stack to leverage Haystack by default, and combined with the GraphQL specific tracing tool Apollo Graph Manager we have complete visibility into the behavior of both the services and the client applications, and how they interact. An ongoing benefit (symptom?) of GraphQL is the ability for clients to request & receive only the data they need to render an experience, and that typically results in a smaller message size than they would otherwise see from our "legacy" APIs. As this continues to happen, we want to be able to pinpoint significant mismatches in message size upstream to assist our efforts to streamline our entire stack. The more our teams do this, the more we increase the performance of our services while simultaneously reducing cloud costs & maintenance costs in the long term.

Today this is not easy to do, and involves digging through various logs to find this data if it exists at all. Being able to capture & visualize this in a single tool makes these optimization opportunities obvious and easy to measure.