Open dudufan opened 1 year ago
i found same question in 2.0.0. it's look like only create task but not release. how to fix it? only one running routine load.
be.info.log
I0926 10:36:31.550918 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=91bcba040a1f44f4-bee8e85ef6bfd278, job_id=44616, txn_id=77135, label=perf-44616-91bcba040a1f44f4-bee8e85ef6bfd278-77135, elapse(s)=0, current tasks num: 1 I0926 10:36:31.550987 109916 routine_load_task_executor.cpp:285] begin to execute routine load task: id=91bcba040a1f44f4-bee8e85ef6bfd278, job_id=44616, txn_id=77135, label=perf-44616-91bcba040a1f44f4-bee8e85ef6bfd278-77135, elapse(s)=0 I0926 10:36:31.551146 109916 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=91bcba040a1f44f4-bee8e85ef6bfd278, job_id=44616, txn_id=77135, label=perf-44616-91bcba040a1f44f4-bee8e85ef6bfd278-77135, elapse(s)=0 I0926 10:36:46.349925 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=10c422ec73ff4e9a-922ac2b418f447b6, job_id=44616, txn_id=77136, label=perf-44616-10c422ec73ff4e9a-922ac2b418f447b6-77136, elapse(s)=0, current tasks num: 2 I0926 10:36:46.349989 109917 routine_load_task_executor.cpp:285] begin to execute routine load task: id=10c422ec73ff4e9a-922ac2b418f447b6, job_id=44616, txn_id=77136, label=perf-44616-10c422ec73ff4e9a-922ac2b418f447b6-77136, elapse(s)=0 I0926 10:36:46.350137 109917 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=10c422ec73ff4e9a-922ac2b418f447b6, job_id=44616, txn_id=77136, label=perf-44616-10c422ec73ff4e9a-922ac2b418f447b6-77136, elapse(s)=0 I0926 10:36:56.353701 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=70e3cd758f114301-92ebc5916cc117e2, job_id=44616, txn_id=77137, label=perf-44616-70e3cd758f114301-92ebc5916cc117e2-77137, elapse(s)=0, current tasks num: 3 I0926 10:36:56.353776 109918 routine_load_task_executor.cpp:285] begin to execute routine load task: id=70e3cd758f114301-92ebc5916cc117e2, job_id=44616, txn_id=77137, label=perf-44616-70e3cd758f114301-92ebc5916cc117e2-77137, elapse(s)=0 I0926 10:36:56.354025 109918 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=70e3cd758f114301-92ebc5916cc117e2, job_id=44616, txn_id=77137, label=perf-44616-70e3cd758f114301-92ebc5916cc117e2-77137, elapse(s)=0 I0926 10:37:06.357452 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=61020b11263e44f7-a977a4130fc4f0bd, job_id=44616, txn_id=77138, label=perf-44616-61020b11263e44f7-a977a4130fc4f0bd-77138, elapse(s)=0, current tasks num: 4 I0926 10:37:06.357524 109919 routine_load_task_executor.cpp:285] begin to execute routine load task: id=61020b11263e44f7-a977a4130fc4f0bd, job_id=44616, txn_id=77138, label=perf-44616-61020b11263e44f7-a977a4130fc4f0bd-77138, elapse(s)=0 I0926 10:37:06.357717 109919 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=61020b11263e44f7-a977a4130fc4f0bd, job_id=44616, txn_id=77138, label=perf-44616-61020b11263e44f7-a977a4130fc4f0bd-77138, elapse(s)=0 I0926 10:37:16.361900 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=b25cc07bb21d4717-a1d403f71596d8e1, job_id=44616, txn_id=77139, label=perf-44616-b25cc07bb21d4717-a1d403f71596d8e1-77139, elapse(s)=0, current tasks num: 5 I0926 10:37:16.361989 109920 routine_load_task_executor.cpp:285] begin to execute routine load task: id=b25cc07bb21d4717-a1d403f71596d8e1, job_id=44616, txn_id=77139, label=perf-44616-b25cc07bb21d4717-a1d403f71596d8e1-77139, elapse(s)=0 I0926 10:37:16.362210 109920 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=b25cc07bb21d4717-a1d403f71596d8e1, job_id=44616, txn_id=77139, label=perf-44616-b25cc07bb21d4717-a1d403f71596d8e1-77139, elapse(s)=0 I0926 10:37:26.365837 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=c11f6ee0ce7b4485-bb1a3b4b23ef7fa2, job_id=44616, txn_id=77140, label=perf-44616-c11f6ee0ce7b4485-bb1a3b4b23ef7fa2-77140, elapse(s)=0, current tasks num: 6
i found same question in 2.0.0. it's look like only create task but not release. how to fix it? only one running routine load.
be.info.log
I0926 10:36:31.550918 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=91bcba040a1f44f4-bee8e85ef6bfd278, job_id=44616, txn_id=77135, label=perf-44616-91bcba040a1f44f4-bee8e85ef6bfd278-77135, elapse(s)=0, current tasks num: 1 I0926 10:36:31.550987 109916 routine_load_task_executor.cpp:285] begin to execute routine load task: id=91bcba040a1f44f4-bee8e85ef6bfd278, job_id=44616, txn_id=77135, label=perf-44616-91bcba040a1f44f4-bee8e85ef6bfd278-77135, elapse(s)=0 I0926 10:36:31.551146 109916 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=91bcba040a1f44f4-bee8e85ef6bfd278, job_id=44616, txn_id=77135, label=perf-44616-91bcba040a1f44f4-bee8e85ef6bfd278-77135, elapse(s)=0 I0926 10:36:46.349925 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=10c422ec73ff4e9a-922ac2b418f447b6, job_id=44616, txn_id=77136, label=perf-44616-10c422ec73ff4e9a-922ac2b418f447b6-77136, elapse(s)=0, current tasks num: 2 I0926 10:36:46.349989 109917 routine_load_task_executor.cpp:285] begin to execute routine load task: id=10c422ec73ff4e9a-922ac2b418f447b6, job_id=44616, txn_id=77136, label=perf-44616-10c422ec73ff4e9a-922ac2b418f447b6-77136, elapse(s)=0 I0926 10:36:46.350137 109917 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=10c422ec73ff4e9a-922ac2b418f447b6, job_id=44616, txn_id=77136, label=perf-44616-10c422ec73ff4e9a-922ac2b418f447b6-77136, elapse(s)=0 I0926 10:36:56.353701 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=70e3cd758f114301-92ebc5916cc117e2, job_id=44616, txn_id=77137, label=perf-44616-70e3cd758f114301-92ebc5916cc117e2-77137, elapse(s)=0, current tasks num: 3 I0926 10:36:56.353776 109918 routine_load_task_executor.cpp:285] begin to execute routine load task: id=70e3cd758f114301-92ebc5916cc117e2, job_id=44616, txn_id=77137, label=perf-44616-70e3cd758f114301-92ebc5916cc117e2-77137, elapse(s)=0 I0926 10:36:56.354025 109918 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=70e3cd758f114301-92ebc5916cc117e2, job_id=44616, txn_id=77137, label=perf-44616-70e3cd758f114301-92ebc5916cc117e2-77137, elapse(s)=0 I0926 10:37:06.357452 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=61020b11263e44f7-a977a4130fc4f0bd, job_id=44616, txn_id=77138, label=perf-44616-61020b11263e44f7-a977a4130fc4f0bd-77138, elapse(s)=0, current tasks num: 4 I0926 10:37:06.357524 109919 routine_load_task_executor.cpp:285] begin to execute routine load task: id=61020b11263e44f7-a977a4130fc4f0bd, job_id=44616, txn_id=77138, label=perf-44616-61020b11263e44f7-a977a4130fc4f0bd-77138, elapse(s)=0 I0926 10:37:06.357717 109919 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=61020b11263e44f7-a977a4130fc4f0bd, job_id=44616, txn_id=77138, label=perf-44616-61020b11263e44f7-a977a4130fc4f0bd-77138, elapse(s)=0 I0926 10:37:16.361900 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=b25cc07bb21d4717-a1d403f71596d8e1, job_id=44616, txn_id=77139, label=perf-44616-b25cc07bb21d4717-a1d403f71596d8e1-77139, elapse(s)=0, current tasks num: 5 I0926 10:37:16.361989 109920 routine_load_task_executor.cpp:285] begin to execute routine load task: id=b25cc07bb21d4717-a1d403f71596d8e1, job_id=44616, txn_id=77139, label=perf-44616-b25cc07bb21d4717-a1d403f71596d8e1-77139, elapse(s)=0 I0926 10:37:16.362210 109920 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=b25cc07bb21d4717-a1d403f71596d8e1, job_id=44616, txn_id=77139, label=perf-44616-b25cc07bb21d4717-a1d403f71596d8e1-77139, elapse(s)=0 I0926 10:37:26.365837 111145 routine_load_task_executor.cpp:267] submit a new routine load task: id=c11f6ee0ce7b4485-bb1a3b4b23ef7fa2, job_id=44616, txn_id=77140, label=perf-44616-c11f6ee0ce7b4485-bb1a3b4b23ef7fa2-77140, elapse(s)=0, current tasks num: 6
I don't know why, but it worked when increasing this two parameter: (fe)max_routine_load_task_num_per_be , (be)routine_load_thread_pool_size
i deploy the test env is 1fe+1be. i found when i increasing max_batch_interval=60(max),it's looks like be nomal. but consume very slow , about 400 row per times.
but i don't know why
I0926 11:37:36.238286 172134 data_consumer.cpp:234] kafka consume timeout: 63480f09a42d6165-591140e9b9db3fae I0926 11:37:37.984771 172134 data_consumer.cpp:234] kafka consume timeout: 63480f09a42d6165-591140e9b9db3fae I0926 11:37:39.251384 172134 data_consumer.cpp:234] kafka consume timeout: 63480f09a42d6165-591140e9b9db3fae I0926 11:37:40.251472 172134 data_consumer.cpp:234] kafka consume timeout: 63480f09a42d6165-591140e9b9db3fae I0926 11:37:40.251492 172134 data_consumer.cpp:257] kafka consumer done: 63480f09a42d6165-591140e9b9db3fae, grp: a648ee7bb83177ec-d82f3befe86d3cb3. cancelled: 0, left time(ms): -776, total cost(ms): 60776, consume cost(ms): 60774, received rows: 390, put rows: 390 I0926 11:37:40.251507 172134 data_consumer_group.cpp:87] all consumers are finished. shutdown queue. group id: a648ee7bb83177ec-d82f3befe86d3cb3 I0926 11:37:40.251523 160128 data_consumer_group.cpp:131] consumer group done: a648ee7bb83177ec-d82f3befe86d3cb3. consume time(ms)=60776, received rows=390, received bytes=42380, eos: 1, left_time: -776, left_rows: 299610, left_bytes: 209672820, blocking get time(us): 60749899, blocking put time(us): 156, id=f46cc7a6026a448f-b398133ce6e0eed1, job_id=44616, txn_id=77200, label=perf-44616-f46cc7a6026a448f-b398133ce6e0eed1-77200, elapse(s)=60
Hi, I would like to confirm if you are using routine load multi-table import, if you have a small number of jobs this may be another issue. If you have a large number the routine load jobs, you can:
And it should be noted that max_routine_load_task_num_per_be must be less than routine_load_thread_pool_size.
Hi, I would like to confirm if you are using routine load multi-table import, if you have a small number of jobs this may be another issue. If you have a large number the routine load jobs, you can:
- Reduce concurrency by lowering max_routine_load_task_concurrent_num (fe.config)
- Increase slots by increasing max_routine_load_task_num_per_be & routine_load_thread_pool_size. (be.config)
And it should be noted that max_routine_load_task_num_per_be must be less than routine_load_thread_pool_size.
yes,im using routine load multi-table import i set max_routine_load_task_concurrent_num = 1 ,but in be.log the task num still increase to routine_load_thread_pool_size,why? 1fe 1be,should i restart be?
Hi, I would like to confirm if you are using routine load multi-table import, if you have a small number of jobs this may be another issue. If you have a large number the routine load jobs, you can:
- Reduce concurrency by lowering max_routine_load_task_concurrent_num (fe.config)
- Increase slots by increasing max_routine_load_task_num_per_be & routine_load_thread_pool_size. (be.config)
And it should be noted that max_routine_load_task_num_per_be must be less than routine_load_thread_pool_size.
yes,im using routine load multi-table import i set max_routine_load_task_concurrent_num = 1 ,but in be.log the task num still increase to routine_load_thread_pool_size,why? 1fe 1be,should i restart be?
put this config to fe.conf,not work.when i resume the multi-table routine load, curent tasks num can't remain one job task per be
Hi, I would like to confirm if you are using routine load multi-table import, if you have a small number of jobs this may be another issue. If you have a large number the routine load jobs, you can:
- Reduce concurrency by lowering max_routine_load_task_concurrent_num (fe.config)
- Increase slots by increasing max_routine_load_task_num_per_be & routine_load_thread_pool_size. (be.config)
And it should be noted that max_routine_load_task_num_per_be must be less than routine_load_thread_pool_size.
Yes, I also use multi-table import, and under this occasion Increase slots can work for a while. After setting be multi_table_batch_plan_threshold=5 in be.conf,it is ok. But I still dont know why...
If you use a multi-table routineload and the amount of data is particularly small, you can adjust this parameter. multi_table_batch_plan_threshold, usually if there are very few tables, you can adjust it to 5 or less.
If you use a multi-table routineload and the amount of data is particularly small, you can adjust this parameter. multi_table_batch_plan_threshold, usually if there are very few tables, you can adjust it to 5 or less.
if i have many tables,more than 200,how to do?may be use another way to put data? use all config above,not useful,routine load task stiil increase slowly althought i stop the load,but pool not release.
i found that normal routine load is ok. but muti-table routine load has problem.
noramal log:
I0927 14:12:39.462114 351714 routine_load_task_executor.cpp:267] submit a new routine load task: id=5f31ac24344e4c47-abfd3f3efe9b2c58, job_id=160686, txn_id=82139, label=health-160686-5f31ac24344e4c47-abfd3f3efe9b2c58-82139, elapse(s)=0, current tasks num: 2 I0927 14:12:39.462155 159997 routine_load_task_executor.cpp:285] begin to execute routine load task: id=5f31ac24344e4c47-abfd3f3efe9b2c58, job_id=160686, txn_id=82139, label=health-160686-5f31ac24344e4c47-abfd3f3efe9b2c58-82139, elapse(s)=0 I0927 14:12:39.462463 159997 stream_load_executor.cpp:71] begin to execute job. label=health-160686-5f31ac24344e4c47-abfd3f3efe9b2c58-82139, txn_id=82139, query_id=5f31ac24344e4c47-abfd3f3efe9b2c58 I0927 14:12:39.462525 159997 fragment_mgr.cpp:689] query_id: 5f31ac24344e4c47-abfd3f3efe9b2c58 coord_addr TNetworkAddress(hostname=192.168.1.37, port=9020) total fragment num on current host: 0 I0927 14:12:39.462541 159997 fragment_mgr.cpp:758] Register query/load memory tracker, query/load id: 5f31ac24344e4c47-abfd3f3efe9b2c58 limit: 2.00 GB I0927 14:12:39.463045 159997 data_consumer_group.cpp:111] start consumer group: 0e4573b9099d6821-ea027b26758a5bbd. max time(ms): 60000, batch rows: 200000, batch size: 209715200. id=5f31ac24344e4c47-abfd3f3efe9b2c58, job_id=160686, txn_id=82139, label=health-160686-5f31ac24344e4c47-abfd3f3efe9b2c58-82139, elapse(s)=0 I0927 14:12:39.463085 159961 fragment_mgr.cpp:528] PlanFragmentExecutor::_exec_actual|query_id=5f31ac24344e4c47-abfd3f3efe9b2c58|instance_id=5f31ac24344e4c47-abfd3f3efe9b2c59|pthread_id=140232152180480 I0927 14:12:39.463495 160763 tablets_channel.cpp:103] open tablets channel: (load_id=5f31ac24344e4c47-abfd3f3efe9b2c58, index_id=160425), tablets num: 128, timeout(s): 120 I0927 14:13:40.305856 159997 data_consumer_group.cpp:131] consumer group done: 0e4573b9099d6821-ea027b26758a5bbd. consume time(ms)=60842, received rows=30, received bytes=3211, eos: 1, left_time: -842, left_rows: 199970, left_bytes: 209711989, blocking get time(us): 60842733, blocking put time(us): 2, id=5f31ac24344e4c47-abfd3f3efe9b2c58, job_id=160686, txn_id=82139, label=health-160686-5f31ac24344e4c47-abfd3f3efe9b2c58-82139, elapse(s)=60 I0927 14:13:40.306563 159961 vtablet_sink.cpp:894] VNodeChannel[160425-130002], load_id=5f31ac24344e4c47-abfd3f3efe9b2c58, txn_id=82139, node=192.168.1.37:8060 mark closed, left pending batch size: 1 I0927 14:13:40.307304 160596 tablets_channel.cpp:145] close tablets channel: (load_id=5f31ac24344e4c47-abfd3f3efe9b2c58, index_id=160425), sender id: 0, backend id: 130002 I0927 14:13:40.307919 161034 vtablet_sink.cpp:1121] all node channels are stopped(maybe finished/offending/cancelled), sender thread exit. 5f31ac24344e4c47-abfd3f3efe9b2c58 I0927 14:13:40.324090 160596 load_channel.cpp:46] load channel removed. mem peak usage=0, info=label: LoadChannel#senderIp=192.168.1.37#loadID=5f31ac24344e4c47-abfd3f3efe9b2c58; consumption: 0; peak_consumption: 0; , load_id=5f31ac24344e4c47-abfd3f3efe9b2c58, is high priority=1, sender_ip=192.168.1.37 I0927 14:13:40.325825 159961 vtablet_sink.cpp:1528] total mem_exceeded_block_ns=0, total queue_push_lock_ns=0, total actual_consume_ns=115047, load id=5f31ac24344e4c47-abfd3f3efe9b2c58 I0927 14:13:40.325846 159961 vtablet_sink.cpp:1572] finished to close olap table sink. load_id=5f31ac24344e4c47-abfd3f3efe9b2c58, txn_id=82139, node add batch time(ms)/wait execution time(ms)/close time(ms)/num: {130002:(17)(0)(19)(1)} I0927 14:13:40.326172 159961 query_context.h:69] Deregister query/load memory tracker, queryId=5f31ac24344e4c47-abfd3f3efe9b2c58, Limit=2.00 GB, CurrUsed=-24.48 KB, PeakUsed=25.31 MB I0927 14:13:40.365257 159997 routine_load_task_executor.cpp:257] finished routine load task id=5f31ac24344e4c47-abfd3f3efe9b2c58, job_id=160686, txn_id=82139, label=health-160686-5f31ac24344e4c47-abfd3f3efe9b2c58-82139, elapse(s)=60, status: [OK], current tasks num: 2
muti-table log:
I0927 14:21:22.827615 351714 routine_load_task_executor.cpp:267] submit a new routine load task: id=71ae9956abab417a-a9d80e07f0150f21, job_id=161003, txn_id=82149, label=perf-161003-71ae9956abab417a-a9d80e07f0150f21-82149, elapse(s)=0, current tasks num: 4 I0927 14:21:22.827625 159997 routine_load_task_executor.cpp:285] begin to execute routine load task: id=71ae9956abab417a-a9d80e07f0150f21, job_id=161003, txn_id=82149, label=perf-161003-71ae9956abab417a-a9d80e07f0150f21-82149, elapse(s)=0 I0927 14:21:22.828194 159997 routine_load_task_executor.cpp:296] recv single-stream-multi-table request, ctx=id=71ae9956abab417a-a9d80e07f0150f21, job_id=161003, txn_id=82149, label=perf-161003-71ae9956abab417a-a9d80e07f0150f21-82149, elapse(s)=0 I0927 14:21:22.828397 159997 data_consumer_group.cpp:111] start consumer group: c74840f8c925b3cf-de3a780dee8213a6. max time(ms): 30000, batch rows: 200000, batch size: 209715200. id=71ae9956abab417a-a9d80e07f0150f21, job_id=161003, txn_id=82149, label=perf-161003-71ae9956abab417a-a9d80e07f0150f21-82149, elapse(s)=0 I0927 14:21:52.831889 159997 data_consumer_group.cpp:131] consumer group done: c74840f8c925b3cf-de3a780dee8213a6. consume time(ms)=30003, received rows=0, received bytes=0, eos: 1, left_time: -3, left_rows: 200000, left_bytes: 209715200, blocking get time(us): 30003469, blocking put time(us): 0, id=71ae9956abab417a-a9d80e07f0150f21, job_id=161003, txn_id=82149, label=perf-161003-71ae9956abab417a-a9d80e07f0150f21-82149, elapse(s)=30
not yet finish last routine load,new task has submit.so pool num be more and more. why? i wait for a long time no finish log appear
Search before asking
Version
doris 2.0.1 no_avx2
What's Wrong?
After creating ROUTINE LOAD, it occurs TOO_MANY_TASKS error. I tought this error maybe resolved by others in 4713 6342 , but I don't know how to resolve it.
then show the load task, msg: 2023-09-26 08:48:15:errCode = 2, detailMessage = failed to send task: errCode = 2, detailMessage = failed to submit task. error code: TOO_MANY_TASKS, msg: (10.0.209.74)[TOO_MANY_TASKS]839b8a214383416e-9302587c0fc837b7_10.0.209.74
related log below:
fe.log
be.INFO
be.waring.log...
What You Expected?
ROUTINE LOAD execute normally.
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct