apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.06k stars 904 forks source link

[Bug] [flink] Interrupting the JDBC connection cannot stop the engine executing SQL. #6519

Open Pandas886 opened 2 months ago

Pandas886 commented 2 months ago

Code of Conduct

Search before asking

Describe the bug

The kyuubi connection selection is at the connection level.

When the flink engine is not executing any queries or DML tasks, disconnecting the JDBC connection can automatically shut down the engine. However, when the flink engine is executing queries or DML tasks, disconnecting the JDBC connection cannot automatically shut down the engine.

Checking the job manager's logs, it can be observed that there is no log for the "close session" request, indicating that the request has not been received.

image

I debugged the kyuubi Flink engine and found that when setting 'table.dml-sync' = 'true', the request to interrupt the JDBC connection did not reach the breakpoint while some SQL tasks were still executing. It could reach the breakpoint when there were no SQL tasks running.

It feels like the executing SQL is blocking TCloseSessionReq.

image

The SQL example I executed is as follows: dml task

insert overwrite q18v1 
SELECT
  o_year,
  SUM(CASE
    WHEN nation = 'BRAZIL' THEN volume
    ELSE 0
  END) / SUM(volume) AS mkt_share
FROM
  (
    SELECT
      EXTRACT(YEAR FROM o_orderdate) AS o_year,
      l_extendedprice * (1 - l_discount) AS volume,
      n2.n_name AS nation
    FROM
      part,
      supplier,
      lineitem,
      orders,
      customer,
      nation n1,
      nation n2,
      region
    WHERE
      p_partkey = l_partkey
      AND s_suppkey = l_suppkey
      AND l_orderkey = o_orderkey
      AND o_custkey = c_custkey
      AND c_nationkey = n1.n_nationkey
      AND n1.n_regionkey = r_regionkey
      AND r_name = 'AMERICA'
      AND s_nationkey = n2.n_nationkey
      AND o_orderdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
      AND p_type = 'ECONOMY ANODIZED STEEL'
  ) AS all_nations
GROUP BY
  o_year 
ORDER BY
  o_year 

query task

SELECT
  o_year,
  SUM(CASE
    WHEN nation = 'BRAZIL' THEN volume
    ELSE 0
  END) / SUM(volume) AS mkt_share
FROM
  (
    SELECT
      EXTRACT(YEAR FROM o_orderdate) AS o_year,
      l_extendedprice * (1 - l_discount) AS volume,
      n2.n_name AS nation
    FROM
      part,
      supplier,
      lineitem,
      orders,
      customer,
      nation n1,
      nation n2,
      region
    WHERE
      p_partkey = l_partkey
      AND s_suppkey = l_suppkey
      AND l_orderkey = o_orderkey
      AND o_custkey = c_custkey
      AND c_nationkey = n1.n_nationkey
      AND n1.n_regionkey = r_regionkey
      AND r_name = 'AMERICA'
      AND s_nationkey = n2.n_nationkey
      AND o_orderdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
      AND p_type = 'ECONOMY ANODIZED STEEL'
  ) AS all_nations
GROUP BY
  o_year 
ORDER BY
  o_year 

Affects Version(s)

1.9.2-snapshot

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

flink.execution.target=yarn-application

kyuubi.engine.session.initialize.sql         SET 'table.dml-sync' = 'true';SET 'execution.runtime-mode' = 'batch';

Kyuubi Engine Configurations

No response

Additional context

flink 1.18

Are you willing to submit PR?

github-actions[bot] commented 2 months ago

Hello @Pandas886, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

SteNicholas commented 1 month ago

@Pandas886, do you have interest to fix this?