StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.24k stars 1.83k forks source link

[shared_data] deleting file in staros seems never takes effect when vacuum occurred periodically #27599

Closed wxl24life closed 10 months ago

wxl24life commented 1 year ago

Steps to reproduce the behavior (Required)

In be.INFO log, the same data file is vacuumed periodically. It seems that the dat file has never been deleted at all.

Expected behavior (Required)

Real behavior (Required)

I0719 20:58:58.422240 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:01:58.303380 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:04:58.597882 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:07:58.360976 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:10:58.329208 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:13:58.544032 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:16:58.317865 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:19:58.299844 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:22:58.510511 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:25:58.308576 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:28:58.350298 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:31:58.482692 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:34:58.311506 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:37:58.628608 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:40:58.632773 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:43:58.357741 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:46:58.435446 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:49:58.452464 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:52:58.721911 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:55:58.578860 101833 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 21:58:58.485316 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 22:01:58.804502 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 22:04:58.655936 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat
I0719 22:07:58.582829 101832 vacuum.cpp:55] Deleted staros://10644/data/0000000000000003_ff9a7bb6-2d81-41de-a658-77ae509b6cb2.dat

StarRocks version (Required)

runtime version build from main branch on 2023/07/19

sduzh commented 1 year ago

Hi @wxl24life, could you please follow these steps to help troubleshoot the issue:

  1. Run show tablet 10644 to identify which table and partition the tablet 10644 belongs to.
  2. Run show partitions from <table>, and check the output to find the corresponding partition. Look at the visibleVersion and compactVersion values for that partition.
sduzh commented 1 year ago

BTW,what object storage are you using?

wxl24life commented 1 year ago

BTW,what object storage are you using?

Aliyun OSS

wxl24life commented 1 year ago

Hi @wxl24life, could you please follow these steps to help troubleshoot the issue:

  1. Run show tablet 10644 to identify which table and partition the tablet 10644 belongs to.
  2. Run show partitions from <table>, and check the output to find the corresponding partition. Look at the visibleVersion and compactVersion values for that partition.

I have checked log, and found that the vacuum log has stopped at 6 am. The log started at around 18:00 yesterday. It continues for 12 hours. Not sure if there is any relation with OSS object storage policy?

sduzh commented 1 year ago

Hi @wxl24life, could you please follow these steps to help troubleshoot the issue:

  1. Run show tablet 10644 to identify which table and partition the tablet 10644 belongs to.
  2. Run show partitions from <table>, and check the output to find the corresponding partition. Look at the visibleVersion and compactVersion values for that partition.

I have checked log, and found that the vacuum log has stopped at 6 am. The log started at around 18:00 yesterday. It continues for 12 hours. Not sure if there is any relation with OSS object storage policy?

Vacuum will stop for a partition if there have been no new imports or compactions within 12 hours (lake_autovacuum_stale_partition_threshold).

wxl24life commented 1 year ago

Hi @wxl24life, could you please follow these steps to help troubleshoot the issue:

  1. Run show tablet 10644 to identify which table and partition the tablet 10644 belongs to.
  2. Run show partitions from <table>, and check the output to find the corresponding partition. Look at the visibleVersion and compactVersion values for that partition.

visibleVersion and compactVersion have same value.

MySQL [(none)]> show tablet 10644
    -> ;
+-----------+---------------------------+---------------------------+---------------------------+-------+---------+-------------+---------+--------+------------------------------------------------------------+
| DbName    | TableName                 | PartitionName             | IndexName                 | DbId  | TableId | PartitionId | IndexId | IsSync | DetailCmd                                                  |
+-----------+---------------------------+---------------------------+---------------------------+-------+---------+-------------+---------+--------+------------------------------------------------------------+
| load_test | lineitem_csv_dk_replica_3 | lineitem_csv_dk_replica_3 | lineitem_csv_dk_replica_3 | 10171 | 10641   | 10640       | 10642   | true   | SHOW PROC '/dbs/10171/10641/partitions/10640/10642/10644'; |
+-----------+---------------------------+---------------------------+---------------------------+-------+---------+-------------+---------+--------+------------------------------------------------------------+
1 row in set (0.002 sec)
MySQL [(none)]> use load_test;
Database changed

MySQL [load_test]> show partitions from lineitem_csv_dk_replica_3;
+-------------+---------------------------+----------------+----------------+-------------+--------+--------------+-------+-----------------+---------+----------+-----------+-----------------+------------+-------+-------+-------+
| PartitionId | PartitionName             | CompactVersion | VisibleVersion | NextVersion | State  | PartitionKey | Range | DistributionKey | Buckets | DataSize | RowCount  | EnableDataCache | AsyncWrite | AvgCS | P50CS | MaxCS |
+-------------+---------------------------+----------------+----------------+-------------+--------+--------------+-------+-----------------+---------+----------+-----------+-----------------+------------+-------+-------+-------+
| 10640       | lineitem_csv_dk_replica_3 | 3              | 3              | 4           | NORMAL |              |       | l_orderkey      | 96      | 22.8GB   | 600037902 | true            | false      | 0.00  | 0.00  | 0.00  |
+-------------+---------------------------+----------------+----------------+-------------+--------+--------------+-------+-----------------+---------+----------+-----------+-----------------+------------+-------+-------+-------+
1 row in set (0.005 sec)
sduzh commented 1 year ago

In simple terms, if the last version of a partition was produced by a compaction job, vacuum will repeatedly try to delete files that have already been removed. The aliyun OSS will return OK when deleting a non-existent file, so it appears as though the same files are being deleted repeatedly, when in fact those files no longer exist.

sduzh commented 1 year ago

In simple terms, if the last version of a partition was produced by a compaction job, vacuum will repeatedly try to delete files that have already been removed. The aliyun OSS will return OK when deleting a non-existent file, so it appears as though the same files are being deleted repeatedly, when in fact those files no longer exist.

You can ignore these logs for now. Future versions will optimize this so that vacuum tasks that have already succeeded will not be repeatedly triggered.

github-actions[bot] commented 10 months ago

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!