StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.74k stars 1.75k forks source link

Can specify custom_query_id and kill query by custom_query_id #47699

Closed kaijianding closed 1 month ago

kaijianding commented 3 months ago

Feature request

Want to specify a custom query id and kill query by this id. This feature is a frequently asked about.

Describe the solution you'd like

set custom_query_id=xx;
select ...;

show proc '/current_queries';
kill query 'xx';

this custom_query_id is temporary, it will be removed from session after select sql is finished

alvin-celerdata commented 3 months ago

@kaijianding

Because you are introducing a new statement, it would need to be discussed more before we merge it into the code base. StarRocks already has query_id for each query, what if a user wants to kill a query by query_id? If killing by query_id is supported, when the custom_query_id is needed?

So this is why I'd like to split this feature into two separated parts. One is to introduce a new statement to cancel query by query_id, and the other is to introduce custom_query_id. And we can discuss them one by one.

Also, StarRocks has a statement KILL QUERY conn_id, what is the difference between KILL QUERY and KILL RUNNING QUERY. One statement uses conn_id, and the other uses custom_query_id, it is a little confusing for users.

Even for the KILL RUNNING QUERY statement, there is no corresponding statement to fetch all running queries.

kaijianding commented 3 months ago
  1. query_id can be found using show processlist and show proc '/current_queries' when query IS running. When user can issue show processlist or show proc '/current_queries', he can just do KILL QUERY conn_id|query_id, there is no need kill running query 'query_id'. A query_id is generated by starrocks(even different query_id would be generated during retry), user can't specify this query_id before a query is issued, that's why user only can specifiy custom_query_id which is not changed during starrocks's internal retry.

  2. The difference between KILL QUERY and KILL RUNNING QUERY is that KILL RUNNING QUERY will try to kill query in all FE because user don't know which FE is actually handling this query when user uses SLB/VIP to connect to FE cluster.

  3. Yes, user can't fetch all running queries, though show proc '/current_queries can find query is one FE. But user uses custom_query_id to kill this query if the query is running and taking too much time, there is no need to find all running queries in this scenario. Of cource, we can add another command to list all running queries accross all FE like show proc '/all_current_queries' @alvin-celerdata

alvin-celerdata commented 3 months ago

KILL QUERY conn_id|query_id

Can you confirm that StarRocks has already supported it now? AFAIK, it hasn't supported kill query by query_id. As you can see in the KILL doc, user can just kill query by conn_id.

A query_id is generated by starrocks(even different query_id would be generated during retry)

If the query_id is unique for different executions, IMO we need to introduce an execution_id for each execution rather than changing the query_id.

user can't specify this query_id before a query is issued

Why do users need to know query_id before issuing it? They can get query id by select last_query_id().

that's why user only can specifiy custom_query_id which is not changed during starrocks's internal retry.

That's another thing that I want to talk about introducing custom_query_id. Because this field is set by users, if this is not changed, queries will share the same custom_query_id. It will be dangerous to give the burden to users to guarantee the uniqueness of this field. If there is no guarantee for uniqueness, how can users kill the right query they want to kill?

The difference between KILL QUERY and KILL RUNNING QUERY is that KILL RUNNING QUERY will try to kill query in all FE because user don't know which FE is actually handling this query when user uses SLB/VIP to connect to FE cluster.

For this feature, I think we need to talk 2 things.

  1. the syntax of this statement. I prefer KILL GLOBAL QUERY query_id or KILL STATEMENT query_id, which can differ it from KILL QUERY. Or we can support KILL QUERY query_id, if the query_id is string, we try to kill query by query_id, if query_id is integer, we try to kill query by conn_id.
  2. the semantics of this statement. I think we need the query_id to be the internal query id that is assigned by StarRocks automatically. Because not all users will use custom_query_id, but we need to keep our users as simple as possible. So we need to let them use this statement to kill queries globally even if they know nothing about custome_query_id.

Yes, user can't fetch all running queries, though show proc '/current_queries can find query is one FE. But user uses custom_query_id to kill this query if the query is running and taking too much time, there is no need to find all running queries in this scenario. Of cource, we can add another command to list all running queries accross all FE like show proc '/all_current_queries'

I think SHOW PROC is not a good way to scale, we can add a new system function to get all queries or a system view to fetch all queries.

kaijianding commented 3 months ago
  1. KILL QUERY query_id is not supported now. I meant conn_id that can kill a query from show processlist.
  2. ExecutionId and QueryId are both existed now, and are assigned by starrocks.
  3. Because user want to kill a query from his own code if a query is taking too long time or user just want to cancel the query due to wrong where condition. A user can't find his exact query from show procceslist or show proc '/current_queries', if there are lots of queries are running. In this senario, user can specify a custom_query_id and kill query by this id. select last_query_id() can only return query id after query is FINISHED. Basically, user knows custom_query_id and know nothing about query_id that is assigned by starrocks.
  4. Yes, user SHOULD guarantee the custom_query_id be unique, which is easy for use doing this. User is already doing this when using Clickhouse. At same time, we should check that this user is authorized to kill THE query.
  5. KILL [GLOBAL] QUERY query_id looks good to me. This statement can kill query with query_id=xx or custom_query_id=xx, we can check both two id if xx is string
  6. We can add a way later to display all queries globally.
alvin-celerdata commented 3 months ago

Because user want to kill a query from his own code if a query is taking too long time or user just want to cancel the query due to wrong where condition. A user can't find his exact query from show procceslist or show proc '/current_queries', if there are lots of queries are running. In this senario, user can specify a custom_query_id and kill query by this id. select last_query_id() can only return query id after query is FINISHED. Basically, user knows custom_query_id and know nothing about query_id that is assigned by starrocks.

For the example you given above, if users can get all query IDs from a command/statement, and then they can find the longest-running query and its ID. Then then can use KILL QUERY query-id to cancel it. As you mentioned before, users can't not know how long the query will run before executing it. It is too late for them to execucte set custome_query_id, while the query is running there.

Yes, user SHOULD guarantee the custom_query_id be unique, which is easy for use doing this. User is already doing this when using Clickhouse. At same time, we should check that this user is authorized to kill THE query.

This will be easy for experts but not friendly for most of users. Suppose that users are sending ad-hoc queries interactively, they are required to think of a globally unique id before issuing any ad-hoc query. That's not a good user experience. However, I admit that this feature can be used for senior users who want to control the system better. It is not a NO to add this feature, I just want to emphasize that it should come after supporting query_id

KILL [GLOBAL] QUERY query_id looks good to me. This statement can kill query with query_id=xx or custom_query_id=xx, we can check both two id if xx is string

Good.

Can we get some agreements now?

  1. both of us agree to support KILL QUERY query-id to kill query globally by query-id. Even though we need more discussion on the details of this new statement.
  2. We haven't achieved agreement on how to support custom_query_id.

And I will suggest we go with KILL QUERY query-id first and in the meantime we can keep discussing custom_query_id. And to make things clearer, I suggest we split this discussion into 2 separated issues. One is to discuss KILL QUERY query_id, and the other is about supporting custom_query_id, which we can continue the discussion.

kaijianding commented 3 months ago

Yes, this feature is for expert, usually a platform user who is building a product on starrocks. This product will set custom_query_id for each query and guarantee the custom_query_id is globally unique. And custom_query_id is not required for ad-hoc query, it is optional.

Sure, I will support kill query by query id first, and let's discuss how to set custom_query_id later.