Transaction level metric collection

scgray commented 3 years ago

In the Record Layer, we collect various I/O metrics for a transaction by simply counting or timing various transaction calls, for example: https://github.com/FoundationDB/fdb-record-layer/blob/master/fdb-record-layer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/InstrumentedReadTransaction.java#L99.

The problem is that this doesn't truly represent the cost of certain operations, for example for a range read, we count it has a single read when, in actuality, it may have involved many round trips to the server to satisfy. Or, maybe it required NO requests to the server because of the RYW cache. We have no one of knowing.

It would be useful if the client driver could track and expose metrics within a transaction such that we can track the real cost of the work done within the transaction.

apkar commented 3 years ago

It may be useful to expose NetworkMetrics as well. As that is important metric to see when network thread is being saturated.

Question: Do we want to pick specific metrics (or trace events) and expose them or expose all cached trace events ?

sfc-gh-abeamon commented 3 years ago

Do we want to pick specific metrics (or trace events) and expose them or expose all cached trace events ?

If we go the route of allowing clients to query trace events directly, it might be rather difficult to support that with our existing API versioning mechanism. Either we have to accept that trace events aren't fixed at an API version or we have to introduce new versioning semantics for trace events (or at least for cached ones). Doing the former doesn't seem great given that it basically breaks API versioning if you use this functionality.

I think a good option to consider for this is exposing various metrics through the special key-space.

apple / foundationdb

Transaction level metric collection #4156