elastic / apm-agent-java

https://www.elastic.co/guide/en/apm/agent/java/current/index.html
Apache License 2.0
566 stars 320 forks source link

Tracing JDBC connection creation #941

Open minhnguyenvan95 opened 4 years ago

minhnguyenvan95 commented 4 years ago

JDBC open connection and resultSet.next() is not logging in span

    Context initContext = new InitialContext();
    Context envContext = (Context)initContext.lookup("java:/comp/env");
    DataSource ds = (DataSource)envContext.lookup(dsName);
    conn=ds.getConnection();
    stmt=conn.createStatement();
    rset=stmt.executeQuery("select a.* from editor.cw_news a");
    while (rset.next()){
        out.print("<br>"+ rset.getString("newstitle"));
    }

Transaction show just the select span and there is no logging for open connection time and resultSet retrieve time

eyalkoren commented 4 years ago

While tracing connection opening makes perfect sense, tracing ResultSet#next may be way too much. What if the query yields 10000 rows? Do we want a span for each? Maybe in the future we can think of collecting the total duration of cursor manipulation actions and report those when the statement is closed (which is not guaranteed) or something of that sort, but this is not trivial.

I suggest keeping this issue for connection opening only for now and deal with cursor actions through a different issue if you think you have a good solution for that. @minhnguyenvan95 what do you think?

As a side note- connection creation may be done in the background on a dedicated thread by the connection pool, in which case we are not guaranteed to see spans with real connection issues as those would not occur within traced transactions.

minhnguyenvan95 commented 4 years ago

Thank you for your fully answer . With connection opening span , would you please help me how to implement this . My legacy project is not using connection pool .

eyalkoren commented 4 years ago

Without looking too much into this, it seems that java.sql.Driver#connect would be a native candidate for that.

NOTE: by instrumenting java.sql.Driver#connect we may eliminate the possibly-active java.sql.Connection#getMetaData usage, getting the URL from the first argument, the user from the second (java.util.Properties) argument and the vendor based on the Driver class name.

eyalkoren commented 4 years ago

Found the complementary issue for fetches - #907

eyalkoren commented 4 years ago

@minhnguyenvan95 we DO intend to get to that at some point, but not immediately, as there are other things we currently do with higher priority. So we think this issue is valuable. Do you have any specific objection that we keep it open?

minhnguyenvan95 commented 4 years ago

I see you refer another issue https://github.com/elastic/apm-agent-java/issues/907 so I closed this issue. Let me reopen it .

CoderLan0668 commented 3 years ago

@eyalkoren i need a feature about tracing connection creation, including JDBC or redis connection. so is there anyone working on this feature? if no, i will make it.

eyalkoren commented 3 years ago

Nobody is working on it now, please go ahead, thanks! Although we normally lean towards external plugins, this case will be suitable as an extension for the existing plugins as it requires access to internal structures and metadata.

For JDBC, it seems that the best candidate is java.sql.Driver#connect(). A few guidelines:

For Redis or other plugins - I have no advice other than separating PRs such that each PR enhances a single plugin.

@felixbarny any objection to this suggestion?

CoderLan0668 commented 3 years ago

@eyalkoren javax.sql.DataSource#getConnection is my favor, as getting a connection from a datasource is a common usage. Sometimes , getting a connection from a datasource would be time consuming, for example, the connection pool is exhausted, or a condition race with other thread.

so i prefer an instrumentation for javax.sql.DataSource#getConnection, rather than the network connect Driver#connect()

eyalkoren commented 3 years ago

so i prefer an instrumentation for javax.sql.DataSource#getConnection, rather than the network connect Driver#connect()

But this will only support JDBC usages through DataSource, which is very partial. The basic implementation for this feature needs to capture exits. You may, in addition to Driver, instrument DataSource#getConnection() as well (and make sure you capture only one span in case calls are nested), but we need to think if this is the right thing to do anyway - I am not sure whether latency related to depleted connection pool or contention should be attributed to the DB tier.

This reminds me of another implementation guideline: you must make sure you only create one span for one action. In many cases, these calls will be nested, in which case you can create and activate a span at the entry of the outermost and probably fill span info at the exit of the innermost (not sure if that's the best, only based on intuition). You should use AbstractSpan#createExitSpan for span creation.

However, hold on with that, let us discuss if we actually want this addition. The problem of attributing latency to DB even if it's not related to actual communication with the DB can be a problem if instrumenting only Driver. We may want to skip this altogether, or use a different span type (e.g. dp-pool).

felixbarny commented 3 years ago

The issue I see with instrumenting DataSource#getConnection() is that these will often be served from a connection pool and creating a span for these isn't that interesting. What's more interesting is tracing when a new connection gets established. However, it's hard to differ between the two cases.

Further complications/open questions

CoderLan0668 commented 3 years ago

use a different span type (e.g. dp-pool).

I agree with you, the latency of getting a connection from a pool should not be attributed to the DB tier.

I plan to use a different span type pool, and subtype jdbc jedis``redission, and separating PRs such that each PR enhances a single plugin.

Is it ok?

CoderLan0668 commented 3 years ago

use a different span type (e.g. dp-pool).

I agree with you, the latency of getting a connection from a pool should not be attributed to the DB tier.

I plan to use a different span type pool, and subtype jdbc jedis``redission, and separating PRs such that each PR enhances a single plugin.

Is it ok?

eyalkoren commented 3 years ago

Are DataSource implementations required to call a Driver to get a Connection? It does not seem that there's a guarantee.

I think it's fair to assume the connection establishment is not implemented twice within a driver. I debugged all our co.elastic.apm.agent.jdbc.JdbcDbIT drivers and this assumption seems to hold for all of them.

There are Driver implementations that also implement connection pooling, for example org.apache.commons.dbcp.PoolingDriver

Agreed, that's what I meant in my comment above, which is why I asked @CoderLan0668 to wait even with the Driver instrumentation. I think it's a good time for us to process this a bit and decide what we want to do.

CoderLan0668 commented 3 years ago

these will often be served from a connection pool and creating a span for these isn't that interesting. What's more interesting is tracing when a new connection gets established.

Actually, the tracing DataSource#getConnection had been used for a long time(i fork and modify the agent),and it's helpful to show me the performance problem of the connection pool.

There are Driver implementations that also implement connection pooling, for example org.apache.commons.dbcp.PoolingDriver

i think there is no conflict about the driver or the connection pool, actually, they are the same as a pool, some one would use org.apache.commons.dbcp.PoolingDriver, and some use connection pool (for example hikari, dbcp and so on )+ driver(for example h2 driver ,mysql driver)

CoderLan0668 commented 3 years ago

@eyalkoren @felixbarny What are your opinion about tracing getting connection from a pool?

matthiaswelz commented 3 years ago

I have just created a really simple plugin to instrument DataSource#getConnection like @CoderLan0668 and attached it to one of my real-world applications. After running a very simple performance test, I can see that in some traces, the getConnection calls account for over 50% of the transaction time.

Of course, my example might not be representative about the elastic community. But I definitely wanted to voice my support for this feature - I definitely see value in this instrumentation and would welcome to see something in the elastic-apm-agent in the future out-of-the box.

eyalkoren commented 3 years ago

Thanks for the valuable feedback! After discussing this a bit, here's what we propose:

Similarly, we can instrument java.sql.Driver#connect() as well, with some minor differences:

Makes sense?

matthiaswelz commented 3 years ago
  • Introduce a new internal config option that will allow us to discard fast spans (as most will likely be such) called span_min_duration_db, that will behave similarly to span_min_duration - all spans faster than this threshold will be discarded. It shouls have a default higher than 0, let's start with 5ms.

Just to comment on that aspect: In our application, most statements take less than 3ms to execute. The new config option (especially if it’s an internal one that cannot be overridden) should definitely not lead to other db spans being dropped - that would be quite unfortunate!

Really nice would be a generic (documented) configuration option span_mindurationtype where type simply refers to the span type. Then, span_min_duration_app, span_min_duration_db and others would be provided implicitly. And maybe it could optionally even include the subtype and action - this would allow fine-tuning the min duration quite extensively.

tobiasstadler commented 3 years ago

@eyalkoren The Postgres JDBC implementation for DataSource#getConnection calls DriverManager#getConnection which calls Driver#connect(). I don't think creating two spans for the connection creation is valuable, so maybe the instrumentation for Driver#connect() should only create a span if the active span was not created by the instrumentation for DriverManager.getConnection?

matthiaswelz commented 3 years ago

maybe the instrumentation for Driver#connect() should only create a span if the active span was not created by the instrumentation for DriverManager.getConnection?

That's how I understood the proposed solution:

  • OnMethodEnter advice: first thing to do is check whether the currently active span is not of type db. If it is - don't do anything. If not - do the same as above.

If Driver.connect is being called (directly or indirectly) by DataSource.getConnection, then it would already have an active span of type db on enter.

I think that approach is also being used by the statement instrumentation: If there's already an active span of type db, no additional span is created.

BTW: This also makes sure that connection health checks (some application servers check connections when returning them from the pool for their health by executing a dummy statement) - with the proposed solution, these health checks would be attributed to the getConnection span - which is only logical, in my opinion.

eyalkoren commented 3 years ago

Just to comment on that aspect: In our application, most statements take less than 3ms to execute. The new config option (especially if it’s an internal one that cannot be overridden) should definitely not lead to other db spans being dropped - that would be quite unfortunate!

Sorry for not being clear enough - an internal config is still a config option, so you can configure it exactly like any other configs. The only difference is that it is not documented. We don't want to overload our users with too many fine-grained config options, so when we add a config option for a very specific aspect and we think the default will be good enough at ~100% of the times, we use a hidden one. The one I propose to add is not related to any DB span related to actual query. We don't drop "exit spans" unless the transaction is not sampled, the transaction created too many spans or similar spans are compressed. This new threshold is only for the purpose of not creating non-interesting getConnection spans, such that just get a connection from the pool without any delay, which is expected to be the norm.

@eyalkoren The Postgres JDBC implementation for DataSource#getConnection calls DriverManager#getConnection which calls Driver#connect(). I don't think creating two spans for the connection creation is valuable, so maybe the instrumentation for Driver#connect() should only create a span if the active span was not created by the instrumentation for DriverManager.getConnection?

Not only Postgres, all other drivers I tested are doing the same. What you propose is exactly what I proposed, but I was probably not clear enough on this one as well. I edited my comment above now, so I hope it makes more sense now.

tobiasstadler commented 3 years ago

Sorry, I missed that