Open gmaz42 opened 3 years ago
@khaf please let me know what you think about this, I might start a PR.
Sorry I thought I had made a response, but apparently I hadn't posted it. Historically, we have been hesitant to add instrumentation to the client since it can affect performance. We are open to ideas, but at this moment we can not promise anything concrete until we make sure we can do it in a way that does not incur any performance penalties.
I understand; I would also not be happy with writing an implementation that brings performance penalties.
I think this could be done in 2 ways:
I do not think that (2) would incur in (measurable) performance penalties because the measurement is already happening (each write/read on a connection returns the number of bytes), it is simply not accounted for.
The broader question would be how to "neatly" provide an API for accessing this information.
Some ideas I have had so far:
Another important bit of information when doing performance analysis is: which node did reply to my request? This can help spot degraded nodes.
By the way, I have not been able to analyze traffic in the cluster I am currently using to detect what could possibly cause slow replies and I am falling back to capturing traffic with tcpdump
for later analysis; @khaf do you know about a better approach? I would be interested in knowing what happens during one of these GetObject
/ PutObject
calls from server perspective.
@gmaz42 Sorry it seems this message flew under my radar. Both your approaches have major downsides: first one can lead to race conditions, and the second one needs a major API overhaul. The only way I see we can do something in that vein would be to include the stats with the record. Problem being that if the transaction hits an error mid transaction, you will lose the stats (since the returned record will be nil.)
I still do not know what could be best approach. At the moment we are considering exporting the stats outside the client, but even that comes with its own set of issues. We are still looking into it though.
No problem, thanks for your reply @khaf!
first one can lead to race conditions
You could use the object extracted from context only if it satisfies a specific interface, and leave to caller to correctly implement that interface in a goroutine-safe fashion. This way you would support also the use-case that caller provides a single-use object before each call (that has no race condition concern). The client library would not even need to provide any solid implementation of this stats object, could use a private one for its tests alone.
I still do not know what could be best approach. At the moment we are considering exporting the stats outside the client, but even that comes with its own set of issues. We are still looking into it though.
I understand; thanks. Meanwhile my troubleshooting has gone a few levels lower in the stack and I am currently checking under which conditions timeouts happen (from Aerospike client PoV) because the process is not pulling out TCP packets fast enough from the TCP stack (this is a kubernetes scenario, so multiple processes and kernel scale internals are at play).
The callback approach for instrumentation sounds nice. Let me think about it for a bit and see where it takes me.
I am noticing a problem with reading objects that might be large (spanning 1-2 megabytes) when setting a deadline on the time allowed to retrieve them.
Unfortunately they are an exception rather than the rule, so I was wondering: would it be possible to get information about the read/written connection bytes from a
Get*
operation?I think it is not currently possible but if you are willing to accept a PR for it I could add a new set of methods like
GetObjectWithStats
?