StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.66k stars 1.75k forks source link

Tombstone tables occupy lake storage #50230

Open sergeyshaykhullin opened 2 weeks ago

sergeyshaykhullin commented 2 weeks ago

Unable to drop storage volume, because some really old tables from deleted db exists in meta and storing data in s3

Steps to reproduce the behavior (Required)

I don't really know why and when this happened

  1. Create storage volume
  2. Create db
  3. Create table
  4. Insert something
  5. Drop table
  6. Drop db

Expected behavior (Required)

drop storage volume works

Real behavior (Required)

drop storage volume REDACTED;
SQL Error [1064] [42000]: Unexpected exception: Storage volume 'REDACTED' is referenced by dbs or tables, dbs: [], tables: [348128, 348943, 1163129]

Mentioned tables ids are not present in default catalog dbs

StarRocks version (Required)

3.3.2-857dd73

kevincai commented 2 weeks ago

when drop the db or table, please be sure to add force to drop them immediately, otherwise, the objects are just moved to recyclebin and stay there for a while before actually cleaned. Before the object get clean permanently, the storage volume will be considered as in-use and is not allowed to drop.

sergeyshaykhullin commented 2 weeks ago

Yes, but this tables alive about 8 months)

kevincai commented 2 weeks ago

which table? And do you still repro it with a minimal repro steps described above?

sergeyshaykhullin commented 2 weeks ago

I can't provide you real example, because it was 8 months ago and catalog doesn't have even table names

kevincai commented 2 weeks ago

that might make the issue too hard to trouble shoot without knowing the details of the info in the current system.

Did you find any fe log related to StarMgrMetaSyncer?

sergeyshaykhullin commented 2 weeks ago

@kevincai Yes, there are some StarMgrMetaSyncer logs

sergeyshaykhullin commented 2 weeks ago

Seems like there are nothing sensitive

grep with StarMgrMetaSyncer

fe.log

I think that Unable to validate object is a root cause

kevincai commented 2 weeks ago

Should be a known issue that the clean-up of remote storage takes too long to complete when back to the old version where all the tablets under the same table uses the same s3 prefix. The deletion will firstly list all the files under the prefix and then filtering the tablets and then delete the objects one by one (or in batches).

sergeyshaykhullin commented 2 weeks ago

Can we just delete s3 bucket with old storage volume? Does this will break something? Or deletion tasks can recognize, that bucket is deleted and all data already deleted too?

kevincai commented 2 weeks ago

actually you can purge the objects in the s3 bucket, it will help with the clean up tasks in starrocks.

kevincai commented 2 weeks ago

@sergeyshaykhullin how's going on? does storage volume get cleaned after manual clean the objects in the bucket?

sergeyshaykhullin commented 2 weeks ago

@kevincai I deleted all objects in bucket 2 days ago, but keep bucket itself. All the same, fantom tables not deleted

sergeyshaykhullin commented 1 week ago

@kevincai Maybe i can force delete this tables somehow? Bucket is empty, but storage volume contains "tables"

How can i find tables details by table id?

image image

kevincai commented 1 week ago

currently there is no tool/sql to show the table from its id unless extract from the fe meta either from jvm heap dump or from the frontend image

kevincai commented 1 week ago

how's the StarMgrMetaSyncer log? still experience a lot of timeout error?

sergeyshaykhullin commented 1 week ago

@kevincai No more StarMgrMetaSyncer logs