apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.69k stars 3.28k forks source link

[BUG] Doris Stream Load Memory Overcommit Issue Causing Cancellation (Data Does Not Exceed GC Settings) #43550

Open Han-lai opened 2 days ago

Han-lai commented 2 days ago

Search before asking

We are encountering an issue in Doris where stream loads are being canceled due to memory overcommit. The error occurs when attempting to load data using a specific query through SeaTunnel for stream loading into Doris. The Doris process consumes more memory than the configured limit, causing the load to be canceled. However, we have verified that the data being loaded does not exceed the configured GC (garbage collection) settings, and the memory usage aligns with the configured limits.

Version

Doris Version: 2.1.5 Stream Load Settings: streaming_load_max_m*b = 10240 (10 GB) string_type_length_soft_limit_bytes = 2147483643 (limit for string type length) doris.batch.size = 1024 Workload Group Settings: ALTER WORKLOAD GROUP normal PROPERTIES("enable_memory_overcommit" = "false"); ALTER WORKLOAD GROUP normal PROPERTIES("memory_limit" = "70%");

What's Wrong?

Execute the following SQL query on Doris SELECT TimeStamp, UID, LineName, Messagee, source, Target, RID, Message FROM PK_MessageBody_data WHERE LineName='Line_02' AND MessageName='Production_A' AND toString(TimeStamp) > '2024-10-13 00:00:00.000 +0000' AND toString(TimeStamp) < '2024-10-14 00:00:00.000 +0000' AND length(Message) >= 5000000 LIMIT 400; The Doris process throws the following error, indicating memory overcommit and cancellation:The Doris process throws the following error, indicating memory overcommit and cancellation: ErrorCode:[Doris-01], ErrorDescription:[stream load error] - stream load error: [CANCELLED]GC wg for hard limit, wg id:1, name:normal, used:8.00 GB, limit:7.56 GB, backend:1x.xxx.xxx.xxx. cancel top memory used tracker <Load#Id=6c4423150a0475be-b705cd052d5d8bb2> consumption 8.00 GB. details:process memory used 4.12 GB exceed soft limit 9.72 GB or sys available memory 8.09 GB less than warning water mark 1.20 GB., Execute again after enough memory, details see be.INFO., see more in null.

What You Expected?

The query should run successfully without causing a memory overcommit error or canceling the load process, even when processing large message body data.

How to Reproduce?

Expected Behavior: The query should run successfully without causing a memory overcommit error or canceling the load process, even when processing large message body data. Actual Behavior: The load is canceled due to exceeding the memory limit (8.00 GB used vs. 7.56 GB limit). The process is stopped because the memory usage exceeds both the soft limit and available system memory.

Anything Else?

We have confirmed that the data being processed does not exceed the configured GC settings, and the memory usage aligns with the configured limits. The issue persists even though the data volume does not exceed the configured memory limits. The Doris configuration for memory management appears to be working correctly, but the memory overcommit error still occurs. Suggest reviewing the memory management and load handling mechanism to prevent load cancellations due to memory overcommit.

Are you willing to submit PR?

Code of Conduct

dataroaring commented 2 days ago

How much data was loaded in on stream load?

Han-lai commented 2 days ago

How much data was loaded in on stream load?流載入時載入了多少數據? In this stream load process, a total of 100 rows of data were loaded, as indicated by the LoadedRows column. The LoadBytes field shows that the data size loaded is 618,362,793 bytes (approximately 618 MB).In this stream load process, a total of 100 rows of data were loaded, as indicated by the LoadedRows column. The LoadBytes field shows that the data size loaded is 618,362,793 bytes (approximately 618 MB).

However, one of my fields is in JSON format, and its length is very large, with the total length exceeding 5,000,000. AND length(Message) >= 5000000

Label Db Table ClientIp Status Message Url TotalRows LoadedRows FilteredRows UnselectedRows LoadBytes StartTime FinishTime User Comment
pkmsg_autobkt_LIMIT_905708683449597953_0_1730704481157 test_variant pkmsg_autobkt_LIMIT 1X.XXX.XXX.XXX Fail [CANCELLED]GC wg for hard limit, wg id:1, name:normal, used:2.99 GB, limit:2.70 GB, backend:1X.XXX.XXX.XX. cancel top memory used tracker <Load#Id=8948d6ed7e735a70-1081982c54e61cb2> consumption 2.66 GB. details:process memory used 2.63 GB exceed soft limit 9.72 GB or sys available memory 10.23 GB less than warning water mark 1.20 GB., Execute again after enough memory, details see be.INFO. N/A 100 100 0 0 618362793 2024-11-04 07:14:41.232 2024-11-04 07:16:07.364 root