Open Han-lai opened 2 weeks ago
How much data was loaded in on stream load?
How much data was loaded in on stream load?流載入時載入了多少數據? In this stream load process, a total of 100 rows of data were loaded, as indicated by the LoadedRows column. The LoadBytes field shows that the data size loaded is 618,362,793 bytes (approximately 618 MB).In this stream load process, a total of 100 rows of data were loaded, as indicated by the LoadedRows column. The LoadBytes field shows that the data size loaded is 618,362,793 bytes (approximately 618 MB).
However, one of my fields is in JSON format, and its length is very large, with the total length exceeding 5,000,000.
AND length(Message) >= 5000000
Label | Db | Table | ClientIp | Status | Message | Url | TotalRows | LoadedRows | FilteredRows | UnselectedRows | LoadBytes | StartTime | FinishTime | User | Comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pkmsg_autobkt_LIMIT_905708683449597953_0_1730704481157 | test_variant | pkmsg_autobkt_LIMIT | 1X.XXX.XXX.XXX | Fail | [CANCELLED]GC wg for hard limit, wg id:1, name:normal, used:2.99 GB, limit:2.70 GB, backend:1X.XXX.XXX.XX. cancel top memory used tracker <Load#Id=8948d6ed7e735a70-1081982c54e61cb2> consumption 2.66 GB. details:process memory used 2.63 GB exceed soft limit 9.72 GB or sys available memory 10.23 GB less than warning water mark 1.20 GB., Execute again after enough memory, details see be.INFO. | N/A | 100 | 100 | 0 | 0 | 618362793 | 2024-11-04 07:14:41.232 | 2024-11-04 07:16:07.364 | root |
Search before asking
We are encountering an issue in Doris where stream loads are being canceled due to memory overcommit. The error occurs when attempting to load data using a specific query through SeaTunnel for stream loading into Doris. The Doris process consumes more memory than the configured limit, causing the load to be canceled. However, we have verified that the data being loaded does not exceed the configured GC (garbage collection) settings, and the memory usage aligns with the configured limits.
Version
Doris Version: 2.1.5 Stream Load Settings: streaming_load_max_m*b = 10240 (10 GB) string_type_length_soft_limit_bytes = 2147483643 (limit for string type length) doris.batch.size = 1024 Workload Group Settings: ALTER WORKLOAD GROUP normal PROPERTIES("enable_memory_overcommit" = "false"); ALTER WORKLOAD GROUP normal PROPERTIES("memory_limit" = "70%");
What's Wrong?
Execute the following SQL query on Doris
SELECT TimeStamp, UID, LineName, Messagee, source, Target, RID, Message FROM PK_MessageBody_data WHERE LineName='Line_02' AND MessageName='Production_A' AND toString(TimeStamp) > '2024-10-13 00:00:00.000 +0000' AND toString(TimeStamp) < '2024-10-14 00:00:00.000 +0000' AND length(Message) >= 5000000 LIMIT 400;
The Doris process throws the following error, indicating memory overcommit and cancellation:The Doris process throws the following error, indicating memory overcommit and cancellation:ErrorCode:[Doris-01], ErrorDescription:[stream load error] - stream load error: [CANCELLED]GC wg for hard limit, wg id:1, name:normal, used:8.00 GB, limit:7.56 GB, backend:1x.xxx.xxx.xxx. cancel top memory used tracker <Load#Id=6c4423150a0475be-b705cd052d5d8bb2> consumption 8.00 GB. details:process memory used 4.12 GB exceed soft limit 9.72 GB or sys available memory 8.09 GB less than warning water mark 1.20 GB., Execute again after enough memory, details see be.INFO., see more in null.
What You Expected?
The query should run successfully without causing a memory overcommit error or canceling the load process, even when processing large message body data.
How to Reproduce?
Expected Behavior: The query should run successfully without causing a memory overcommit error or canceling the load process, even when processing large message body data. Actual Behavior: The load is canceled due to exceeding the memory limit (8.00 GB used vs. 7.56 GB limit). The process is stopped because the memory usage exceeds both the soft limit and available system memory.
Anything Else?
We have confirmed that the data being processed does not exceed the configured GC settings, and the memory usage aligns with the configured limits. The issue persists even though the data volume does not exceed the configured memory limits. The Doris configuration for memory management appears to be working correctly, but the memory overcommit error still occurs. Suggest reviewing the memory management and load handling mechanism to prevent load cancellations due to memory overcommit.
Are you willing to submit PR?
Code of Conduct