Possible N+1 query problem when fetching job status.

chrisknoll commented 1 year ago

Expected behavior

We should see only periodic queries against the WebAPI DB for job execution status.

Actual behavior

When logging is enabled, we see a constant barrage of SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD.... in logs.

Steps to reproduce behavior

Set up a maven profile that will enabled hibernate and jdbc debugging like so:

    <profile>
      <id>jdbc-debug</id>
      <properties>
        <logging.level.root>info</logging.level.root>
        <logging.level.org.springframework.orm>debug</logging.level.org.springframework.orm>
        <logging.level.org.springframework.jdbc>debug</logging.level.org.springframework.jdbc>
        <logging.level.org.hibernate.SQL>debug</logging.level.org.hibernate.SQL>
        <logging.level.org.springframework.jdbc.core.JdbcTemplate>debug</logging.level.org.springframework.jdbc.core.JdbcTemplate>
        <spring.jpa.show-sql>true</spring.jpa.show-sql>
      </properties> 
    </profile>

Make sure this profile is active, and you recompile your WAR. Then launch the app. You will see this flooding the console:

2023-02-21 12:16:02.252 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.282 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.282 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.321 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.321 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.326 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.326 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.364 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.364 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.366 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.366 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.399 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.399 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.400 DEBUG taskExecutor-1 org.springframework.orm.jpa.JpaTransactionManager - [] - Exposing JPA transaction as JDBC transaction [org.springframework.orm.jpa.vendor.HibernateJpaDialect$HibernateConnectionHandle@242ceea6]
2023-02-21 12:16:02.400 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.400 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]

2023-02-21 12:16:02.436 DEBUG http-nio-8080-exec-4 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]
2023-02-21 12:16:02.437 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL query
2023-02-21 12:16:02.437 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing prepared SQL statement [SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from webapi.BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?]

One specific line that I saw:

2023-02-21 12:24:23.976 DEBUG http-nio-8080-exec-10 org.springframework.jdbc.core.JdbcTemplate - [] - Executing SQL query [SELECT JOB_EXECUTION_ID FROM webapi.BATCH_JOB_EXECUTION E, webapi.BATCH_JOB_INSTANCE I WHERE E.JOB_INSTANCE_ID=I.JOB_INSTANCE_ID ORDER BY JOB_EXECUTION_ID DESC LIMIT 1 OFFSET 16599]

The LIMIT 1 OFFSET 16599 is odd here as well. Why would we select 1 job after the 16599th?

Note the individual queries looking for JOB_EXECUTIONID = ?. The issue is: why is it querying for specific job executions when the only thing happening is getting a list of job statuses (presumably)?

You will need to have many jobs in your spring-batch tables: we have over 30,000 jobs.

chrisknoll commented 1 year ago

As I've been diagnosing different performance problems, I'm watching our db activity. I've been testing polling intervals, and other interactions, and i'm seeing very high 'toples out' and 'block I/O' on our server:

If I'm reading this chart right, there was a blast of 6 Million tuples out in that 1s interval. This doesn't seem right.

anthonysena commented 1 year ago

One item from discussion of this issue: we should publish a script that is useful for cleaning out older jobs to limit the size of the job table to help performance.

anthonysena commented 1 year ago

Another thought here from discussion on the Atlas WG - @chrisknoll observed that sometimes the 1st query to get notification status does not finish and a 2nd query is fired off. We could change the client-side polling such that subsequent requests for getting the job status (via the notifications) waits for the original query to finish.

anthonysena commented 1 year ago

It is possible the newer versions of Spring Batch will have already resolved this performance problem so linking this to #2244.

anthonysena commented 7 months ago

From discussion on Atlas WG, there may be potential issues w/ notifications via Atlas whereby calls to get notifications are queued while the job status query executes.

OHDSI / WebAPI