Open minhnguyenvan95 opened 4 years ago
While tracing connection opening makes perfect sense, tracing ResultSet#next
may be way too much. What if the query yields 10000 rows? Do we want a span for each?
Maybe in the future we can think of collecting the total duration of cursor manipulation actions and report those when the statement is closed (which is not guaranteed) or something of that sort, but this is not trivial.
I suggest keeping this issue for connection opening only for now and deal with cursor actions through a different issue if you think you have a good solution for that. @minhnguyenvan95 what do you think?
As a side note- connection creation may be done in the background on a dedicated thread by the connection pool, in which case we are not guaranteed to see spans with real connection issues as those would not occur within traced transactions.
Thank you for your fully answer . With connection opening span , would you please help me how to implement this . My legacy project is not using connection pool .
Without looking too much into this, it seems that java.sql.Driver#connect
would be a native candidate for that.
NOTE: by instrumenting java.sql.Driver#connect
we may eliminate the possibly-active java.sql.Connection#getMetaData
usage, getting the URL from the first argument, the user from the second (java.util.Properties
) argument and the vendor based on the Driver class name.
Found the complementary issue for fetches - #907
@minhnguyenvan95 we DO intend to get to that at some point, but not immediately, as there are other things we currently do with higher priority. So we think this issue is valuable. Do you have any specific objection that we keep it open?
I see you refer another issue https://github.com/elastic/apm-agent-java/issues/907 so I closed this issue. Let me reopen it .
@eyalkoren i need a feature about tracing connection creation, including JDBC or redis connection. so is there anyone working on this feature? if no, i will make it.
Nobody is working on it now, please go ahead, thanks! Although we normally lean towards external plugins, this case will be suitable as an extension for the existing plugins as it requires access to internal structures and metadata.
For JDBC, it seems that the best candidate is java.sql.Driver#connect()
. A few guidelines:
OnMethodEnter
advice, but only fill it with details at the OnMethodExit
advice, where you would use co.elastic.apm.agent.jdbc.helper.JdbcHelper#getConnectionMetaData
to create and cache metadata for the returned connection and use this info for the span details. For now, we will start with the current implementation that invokes Connection#getMetaData
, in the future we may consider avoiding that by using the URL argument of the instrumented method.type
and subtype
to existing casesaction
, I think connect
would fit. @SylvainJuge - we don't need to align with other agents on the action, right?CONNECT to <db-vendor> at <db-host>(:<db-port>)(/<db-instance>)
, where all db data comes from the created co.elastic.apm.agent.jdbc.helper.ConnectionMetaData
, port is added only if it is >0 and instance is only added if it is not null
and not empty.For Redis or other plugins - I have no advice other than separating PRs such that each PR enhances a single plugin.
@felixbarny any objection to this suggestion?
@eyalkoren javax.sql.DataSource#getConnection
is my favor, as getting a connection from a datasource is a common usage. Sometimes , getting a connection from a datasource would be time consuming, for example, the connection pool is exhausted, or a condition race with other thread.
so i prefer an instrumentation for javax.sql.DataSource#getConnection
, rather than the network connect Driver#connect()
so i prefer an instrumentation for javax.sql.DataSource#getConnection, rather than the network connect Driver#connect()
But this will only support JDBC usages through DataSource
, which is very partial. The basic implementation for this feature needs to capture exits.
You may, in addition to Driver
, instrument DataSource#getConnection()
as well (and make sure you capture only one span in case calls are nested), but we need to think if this is the right thing to do anyway - I am not sure whether latency related to depleted connection pool or contention should be attributed to the DB tier.
This reminds me of another implementation guideline: you must make sure you only create one span for one action. In many cases, these calls will be nested, in which case you can create and activate a span at the entry of the outermost and probably fill span info at the exit of the innermost (not sure if that's the best, only based on intuition). You should use AbstractSpan#createExitSpan
for span creation.
However, hold on with that, let us discuss if we actually want this addition. The problem of attributing latency to DB even if it's not related to actual communication with the DB can be a problem if instrumenting only Driver
. We may want to skip this altogether, or use a different span type
(e.g. dp-pool
).
The issue I see with instrumenting DataSource#getConnection()
is that these will often be served from a connection pool and creating a span for these isn't that interesting. What's more interesting is tracing when a new connection gets established. However, it's hard to differ between the two cases.
Further complications/open questions
DataSource
implementations required to call a Driver
to get a Connection
? It does not seem that there's a guarantee.Driver
implementations that also implement connection pooling, for example org.apache.commons.dbcp.PoolingDriver
.use a different span type (e.g. dp-pool).
I agree with you, the latency of getting a connection from a pool should not be attributed to the DB tier.
I plan to use a different span type pool
, and subtype jdbc
jedis``redission
, and separating PRs such that each PR enhances a single plugin.
Is it ok?
use a different span type (e.g. dp-pool).
I agree with you, the latency of getting a connection from a pool should not be attributed to the DB tier.
I plan to use a different span type pool
, and subtype jdbc
jedis``redission
, and separating PRs such that each PR enhances a single plugin.
Is it ok?
Are DataSource implementations required to call a Driver to get a Connection? It does not seem that there's a guarantee.
I think it's fair to assume the connection establishment is not implemented twice within a driver. I debugged all our co.elastic.apm.agent.jdbc.JdbcDbIT
drivers and this assumption seems to hold for all of them.
There are Driver implementations that also implement connection pooling, for example org.apache.commons.dbcp.PoolingDriver
Agreed, that's what I meant in my comment above, which is why I asked @CoderLan0668 to wait even with the Driver
instrumentation. I think it's a good time for us to process this a bit and decide what we want to do.
these will often be served from a connection pool and creating a span for these isn't that interesting. What's more interesting is tracing when a new connection gets established.
Actually, the tracing DataSource#getConnection
had been used for a long time(i fork and modify the agent),and it's helpful to show me the performance problem of the connection pool.
There are Driver implementations that also implement connection pooling, for example org.apache.commons.dbcp.PoolingDriver
i think there is no conflict about the driver or the connection pool, actually, they are the same as a pool, some one would use
org.apache.commons.dbcp.PoolingDriver
, and some use connection pool (for example hikari, dbcp and so on )+ driver(for example h2 driver ,mysql driver)
@eyalkoren @felixbarny What are your opinion about tracing getting connection from a pool?
I have just created a really simple plugin to instrument DataSource#getConnection
like @CoderLan0668 and attached it to one of my real-world applications. After running a very simple performance test, I can see that in some traces, the getConnection
calls account for over 50% of the transaction time.
Of course, my example might not be representative about the elastic community. But I definitely wanted to voice my support for this feature - I definitely see value in this instrumentation and would welcome to see something in the elastic-apm-agent in the future out-of-the box.
Thanks for the valuable feedback! After discussing this a bit, here's what we propose:
DataSource#getConnection
as follows:
OnMethodEnter
advice: only create and activate a span, set it's type, subtype and action, and return it from the adviceOnMethodExit
advice: fill the span received from the enter advice with all info obtained through co.elastic.apm.agent.jdbc.helper.JdbcHelper#getConnectionMetaData
(this will create and cache metadata for the returned connection). db
as span type, pool
as the subtype and getConnection
as the action, so that these spans don't affect actual SQL metrics.getConnection to <db-vendor> at <db-host>(:<db-port>)(/<db-instance>)
, where all db data comes from the created co.elastic.apm.agent.jdbc.helper.ConnectionMetaData
, port is added only if it is >0 and instance is only added if it is not null
and not empty.span_min_duration_db
, that will behave similarly to span_min_duration
- all spans faster than this threshold will be discarded. It shouls have a default higher than 0, let's start with 5ms
.Similarly, we can instrument java.sql.Driver#connect()
as well, with some minor differences:
db
as span type, connect
as the subtype and connect
as the action.OnMethodEnter
advice: first thing to do is check whether the currently active span is of type db
. If it is - return without doing anything (so that we don't create nested spans). If not - do the same as above - create a span and return it to be used in the exit advice.Makes sense?
- Introduce a new internal config option that will allow us to discard fast spans (as most will likely be such) called
span_min_duration_db
, that will behave similarly tospan_min_duration
- all spans faster than this threshold will be discarded. It shouls have a default higher than 0, let's start with5ms
.
Just to comment on that aspect: In our application, most statements take less than 3ms to execute. The new config option (especially if it’s an internal one that cannot be overridden) should definitely not lead to other db spans being dropped - that would be quite unfortunate!
Really nice would be a generic (documented) configuration option span_mindurationtype where type simply refers to the span type. Then, span_min_duration_app, span_min_duration_db and others would be provided implicitly. And maybe it could optionally even include the subtype and action - this would allow fine-tuning the min duration quite extensively.
@eyalkoren The Postgres JDBC implementation for DataSource#getConnection
calls DriverManager#getConnection
which calls Driver#connect()
. I don't think creating two spans for the connection creation is valuable, so maybe the instrumentation for Driver#connect()
should only create a span if the active span was not created by the instrumentation for DriverManager.getConnection
?
maybe the instrumentation for
Driver#connect()
should only create a span if the active span was not created by the instrumentation forDriverManager.getConnection
?
That's how I understood the proposed solution:
OnMethodEnter
advice: first thing to do is check whether the currently active span is not of typedb
. If it is - don't do anything. If not - do the same as above.
If Driver.connect is being called (directly or indirectly) by DataSource.getConnection, then it would already have an active span of type db on enter.
I think that approach is also being used by the statement instrumentation: If there's already an active span of type db, no additional span is created.
BTW: This also makes sure that connection health checks (some application servers check connections when returning them from the pool for their health by executing a dummy statement) - with the proposed solution, these health checks would be attributed to the getConnection span - which is only logical, in my opinion.
Just to comment on that aspect: In our application, most statements take less than 3ms to execute. The new config option (especially if it’s an internal one that cannot be overridden) should definitely not lead to other db spans being dropped - that would be quite unfortunate!
Sorry for not being clear enough - an internal config is still a config option, so you can configure it exactly like any other configs. The only difference is that it is not documented. We don't want to overload our users with too many fine-grained config options, so when we add a config option for a very specific aspect and we think the default will be good enough at ~100% of the times, we use a hidden one.
The one I propose to add is not related to any DB span related to actual query. We don't drop "exit spans" unless the transaction is not sampled, the transaction created too many spans or similar spans are compressed. This new threshold is only for the purpose of not creating non-interesting getConnection
spans, such that just get a connection from the pool without any delay, which is expected to be the norm.
@eyalkoren The Postgres JDBC implementation for DataSource#getConnection calls DriverManager#getConnection which calls Driver#connect(). I don't think creating two spans for the connection creation is valuable, so maybe the instrumentation for Driver#connect() should only create a span if the active span was not created by the instrumentation for DriverManager.getConnection?
Not only Postgres, all other drivers I tested are doing the same. What you propose is exactly what I proposed, but I was probably not clear enough on this one as well. I edited my comment above now, so I hope it makes more sense now.
Sorry, I missed that
JDBC open connection and resultSet.next() is not logging in span
Transaction show just the select span and there is no logging for open connection time and resultSet retrieve time