StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.23k stars 1.83k forks source link

compaction stuck when using cloud native index #53227

Closed yangrong688 closed 2 days ago

yangrong688 commented 3 days ago

Compaction stuck when using cloud native index, cause backlog of txns. results of show proc '/trasactions/xxx/running/'; image

related be logs

W20241127 01:42:53.770809 139248869025536 lake_service.cpp:225] Fail to publish version: Corruption: not an sstable (bad magic number) be/src/storage/lake/persistent_index_sstable.cpp:35 sstable::Table::Open(options, rf.get(), sstable_pb.filesize(), &table) be/src/storage/lake/lake_persistent_index.cpp:520 sstable->init(std::move(rf), sstable_pb, block_cache->cache()) be/src/storage/lake/update_manager.cpp:834 index.apply_opcompaction(metadata, op_compaction) be/src/storage/lake/txn_log_applier.cpp:122 check_and_recover([&]() { return apply_compaction_log(log.op_compaction(), log.txn_id()); }). tablet_id=3283768 txn_ids=txn_id: 7574654 commit_time: 1732617636 combined_txn_log: false txn_type: TXN_NORMAL force_publish: false

Steps to reproduce the behavior (Required)

This problem happended occasionally, the senerior is simple, we use flink starrocks connector to load data into PK table with cloud native index, after running serveral days, some table meet this problem.

Expected behavior (Required)

compaction should not stuck

Real behavior (Required)

compaction stuck

StarRocks version (Required)

yangrong688 commented 3 days ago

@tracymacding @luohaha