apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.7k stars 3.28k forks source link

[Bug] Cancel Schema change job which is useless in BE #16623

Open Lchangliang opened 1 year ago

Lchangliang commented 1 year ago

Search before asking

Version

1.2 or master also

What's Wrong?

When I have a long schema change job, I want to cancel it and create a new schema change job. It fails sometime.

Example: I has a large table that has many columns. I choose column A to add bitmap index. But i regret it. I want to add bitmap index for column B. So I cancel the job, and create a new schema change job and then the job fails.

LOG: W0206 17:32:49.308645 1579747 schema_change.cpp:1751] failed to obtain schema change lock. base_tablet=12198 W0206 17:32:49.354110 1579747 task_worker_pool.cpp:558] failed to alter tablet|signature=12450|base_tablet_id=12198|new_tablet_id=12450|error=Internal error(error -216): @ 0x55e5ed6fb0a4 doris::Status::ConstructErrorStatus() @ 0x55e5ed6fb565 doris::Status::OLAPInternalError() @ 0x55e5edb0821c doris::SchemaChangeHandler::process_alter_tablet_v2() @ 0x55e5edf1df00 doris::EngineAlterTabletTask::execute() @ 0x55e5eda9c019 doris::StorageEngine::execute_task() @ 0x55e5edab9612 doris::TaskWorkerPool::_alter_tablet() @ 0x55e5edac6819 doris::TaskWorkerPool::_alter_tablet_worker_thread_callback() @ 0x55e5ee2a5805 doris::ThreadPool::dispatch_thread() @ 0x55e5ee29bbef doris::Thread::supervise_thread() @ 0x7f6fa610b17a start_thread @ 0x7f6fa641fdc3 __GI___clone @ (nil) (unknown)

The code in fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java:CancelImpl is about the cancel. And we can learn that the cancel just happens in FE. It will not cancel the workflow in BE and BE will do the cancel job until finish. It can make things weird. I cancel the job, but I fail to add new job.

What You Expected?

When input the cancel command, the schema change job will be canceled completely.

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

isHuangXin commented 1 year ago

Can you give a more detailed test case? For example, create a table, create a bitmap index for column A, and the SQL script to cancel the bitmap index.

Lchangliang commented 1 year ago

I think we can use the ssb case generated by tools dir (tools/ssb-tools). Generate 100 gigabytes of data for the table lineorder and choose two columns to simulate the problem.

Lchangliang commented 1 year ago

https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-BITMAP

Lchangliang commented 1 year ago

CANCEL ALTER TABLE COLUMN FROM tbl_name;

isHuangXin commented 1 year ago

plz assign it to me~ I will have a try.

isHuangXin commented 1 year ago

plz, unassign me, sry for not completing it.

TangSiyang2001 commented 1 year ago

Let me have a try.