Open elderbig opened 2 years ago
Describe the bug
#[async_std::main] async fn main() -> Result<()> { let conf = reader::read_config::read("conf/config.toml"); log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap(); let delay_timer = DelayTimerBuilder::default().build(); let mut i:u64 = 0; for s in conf.service.unwrap() { i = i + 1; //If a task_id is used,new task will be ignore by insert_task fn. let begin_time = Local::now().naive_local(); let shell_task_instance_chain = delay_timer. insert_task(build_task_async_execute_process(i,&s)?)?; let _ = shell_task_instance_chain .next_with_async_wait() .await? .cancel_with_async_wait() .await?; let end_time = Local::now().naive_local(); info!("use [{0}] by task [{1}] init", (end_time - begin_time).num_seconds(), i); } info!("All job inited"); for _ in 0..180 { thread::sleep(Duration::from_secs(60)); } // No new tasks are accepted; running tasks are not affected. Ok(delay_timer.stop_delay_timer()?) }
I config 100 jobs in a configuration file
conf/config.toml
,and init them in a loop,sometimes,it is remaining some jobs can not be init.output like:2021-11-17T19:06:56.221273177+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [1] init 2021-11-17T19:06:58.221268818+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [2] init 2021-11-17T19:06:56.221273177+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [3] init 2021-11-17T19:06:58.221268818+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [4] init .... .... 2021-11-17T19:07:00.221238386+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [59] init 2021-11-17T19:07:02.221417702+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [60] init 2021-11-17T19:07:04.221640690+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [61] init 2021-11-17T19:07:06.221278580+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [62] init 2021-11-17T19:07:08.221419039+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [63] init 2021-11-17T19:07:10.221318266+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [64] init
then,it do nothing for a long time,some remainding jobs not appeared any more.I try tokio, there is the same problem.
@elderbig I'm here, my friend. Let me break it down for you
You are using two special apis insert_task
insert_task
means Add a task in timer_core by event-channel. But it will return a handle that can constantly take out new instances of the task.
cancel_with_async_wait
means Cancel the currently running task instance and async-await it.
So you have a situation here where the task is cancelled as soon as it executes, so you don't see the output.
If you don't want to manipulate task instances in real time, just use add_task
and avoid cancel_with_*
.
Use add_task,it is run fast more.like this:
let s = "echo 1".into();
delay_timer.add_task(build_task_async_execute_process(i,&s)?)?;
Use add_task,it is run fast more.like this:
let s = "echo 1".into(); delay_timer.add_task(build_task_async_execute_process(i,&s)?)?;
Yes, it is nanosecond level speed, hope it helps you.
:)
but I ment another problem,when I run 100 task,every task run 1 time in 1 second,when it run,the memory rise step by step, after about 2 hours,it rise from 8M upto 180M,if I use 1000 task, soon it will use 500M memory. I think the more task and longer time running,the more memory will be used,it may be some thing wrong
but I ment another problem,when I run 100 task,every task run 1 time in 1 second,when it run,the memory rise step by step, after about 2 hours,it rise from 8M upto 180M,if I use 1000 task, soon it will use 500M memory. I think the more task and longer time running,the more memory will be used,it may be some thing wrong
I understand what you mean, I have two questions
does memory usage refer to your rust program, or is it an assessment of your machine usage?
what does your task code template look like roughly?
You could try calling remove_task
to clear most of the tasks after they have been running for many hours to see if the program shrinks significantly in memory.
Thanks for your response,this is my code:
#[async_std::main]
async fn main() -> Result<()>{
log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap();
info!("begin");
let delay_timer = DelayTimerBuilder::default().build();
for i in 0..1000{
delay_timer.add_task(build_task_async_execute_process(i)?)?;
info!("init task id = [{}]", i);
}
info!("==== All job is be init! ====");
for _ in 0..120 {
thread::sleep(Duration::from_secs(60));
}
Ok(delay_timer.stop_delay_timer()?)
}
fn build_task_async_execute_process(task_id:u64) -> Result<Task, TaskError> {
let mut task_builder = TaskBuilder::default();
let body = unblock_process_task_fn("echo hello".into());
task_builder
.set_frequency_repeated_by_cron_str("* * * * * *".into())
.set_task_id(task_id)
.set_maximum_running_time(10)
.set_maximum_parallel_runnable_num(1)
.spawn(body)
}
this time, I run 1000 jobs, so I can see more memory usage.It run for 2 hours,and memory rise to 2.7G,how can I do? and what mean about remove_task
? I want run the job again and again.
thanks a lot!
Thanks for your response,this is my code:
#[async_std::main] async fn main() -> Result<()>{ log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap(); info!("begin"); let delay_timer = DelayTimerBuilder::default().build(); for i in 0..1000{ delay_timer.add_task(build_task_async_execute_process(i)?)?; info!("init task id = [{}]", i); } info!("==== All job is be init! ===="); for _ in 0..120 { thread::sleep(Duration::from_secs(60)); } Ok(delay_timer.stop_delay_timer()?) } fn build_task_async_execute_process(task_id:u64) -> Result<Task, TaskError> { let mut task_builder = TaskBuilder::default(); let body = unblock_process_task_fn("echo hello".into()); task_builder .set_frequency_repeated_by_cron_str("* * * * * *".into()) .set_task_id(task_id) .set_maximum_running_time(10) .set_maximum_parallel_runnable_num(1) .spawn(body) }
this time, I run 1000 jobs, so I can see more memory usage.It run for 2 hours,and memory rise to 2.7G,how can I do? and what mean about
remove_task
? I want run the job again and again. thanks a lot!
Thanks to your @elderbig demand, I made an optimization.
Optimized the use of internal time-wheel
memory. (#28), thanks elderbig
@elderbig !
There is a time-wheel
in delay-timer
, which is the carrier of all tasks.
The time wheel uses slots (time scales) as units, each slot corresponds to a hash table, when a slot is rotated to it will execute the task that is already ready internally, when the task is executed it will move from one slot to another. In order to have enough capacity to store the tasks, there may be a memory allocation here, so that by the time the whole time wheel is traversed, each internal time wheel-slot will have rich memory capacity, and when there are many tasks the memory occupied by the whole time wheel will be very large. So it will be necessary to shrink the memory in time.
This change is to shrink the memory in time after each round of training slots and executing tasks to ensure that the slots have a basic and compact capacity.
You can now use the latest version of delay-timer
v0.10.0 and enjoy the memory optimized version.
I test this version(0.10.0),the problem is reduced,in 2 hours test with 1000 jobs,memory used by my progress rise up from 8M to 319M,it seems come better,but memory is still rise slowly,Can it to be better? this my test project https://github.com/elderbig/delay_timer-test
I test once more, in 5.8 hours,memory rise up to 650M, it seems memory rise up always when program running,bu increase become slowly by slowly,and cpu used in my 12cores computer,top like this:
task: 325 total, 1 running, 324 sleeping, 0 stopped, 0 zombie
%Cpu(s): 18.3 us, 10.1 sy, 0.0 ni, 71.0 id, 0.0 wa, 0.6 hi, 0.1 si, 0.0 st
MiB Mem : 31828.4 total, 27787.6 free, 1532.7 used, 2508.1 buff/cache
MiB Swap: 35013.6 total, 35013.6 free, 0.0 used. 29168.7 avail Mem
PID USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
322014 test 20 0 1227580 18172 4340 S 14.3 0.1 0:22.49 dt_test
586 root 20 0 698356 120628 85836 S 2.3 0.4 0:25.92 Xorg
374145 test 20 0 24.5g 168720 108788 S 2.0 0.5 0:07.79 chrome
1827 test 20 0 691152 83580 57708 S 1.7 0.3 0:10.45 xfce4-terminal
783 root -51 0 0 0 0 D 0.7 0.0 0:13.70 irq/65-i2c-INT3
9 root 20 0 0 0 0 S 0.3 0.0 0:00.87 ksoftirqd/0
21 root 20 0 0 0 0 S 0.3 0.0 0:00.92 ksoftirqd/1
28 root 20 0 0 0 0 S 0.3 0.0 0:00.89 ksoftirqd/2
42 root 20 0 0 0 0 S 0.3 0.0 0:00.90 ksoftirqd/4
63 root 20 0 0 0 0 S 0.3 0.0 0:00.88 ksoftirqd/7
77 root 20 0 0 0 0 S 0.3 0.0 0:00.83 ksoftirqd/9
84 root 20 0 0 0 0 S 0.3 0.0 0:00.91 ksoftirqd/10
91 root 20 0 0 0 0 S 0.3 0.0 0:00.94 ksoftirqd/11
245 root 20 0 0 0 0 I 0.3 0.0 0:00.22 kworker/9:2-events
1085 root 20 0 1928896 33700 26484 S 0.3 0.1 0:00.22 containerd
1350 test 20 0 161384 6688 5804 S 0.3 0.0 0:00.14 at-spi2-registr
2125 test 20 0 16.3g 99936 81416 S 0.3 0.3 0:04.00 chrome
14455 test 20 0 15564 5860 3608 R 0.3 0.0 0:02.05 top
dt_test is the test pocess,and 100 jobs will cost 1.7% cpu total. test program https://github.com/elderbig/delay_timer-test
8
Hi friend, I noticed your feedback that the program generates more and more buffering during operation, and now it has got 75% - 80% memory optimization.
For after OS optimization, all remaining buffers should now be planned by OS rationally (physical memory, with swap area), as shown on your terminal only 0.1% of physical memory overhead.
Actually I can optimize it further if you need, but it may affect 10% of the performance of the internal scheduler, but it still guarantees tens of thousands of tasks per second to be scheduled.
Thank for your reply.In my submission in fact, this crate is good enough. but might be possible there is some memory leak? I am beginner in rust and not able to judge this question now.
Thank for your reply.In my submission in fact, this crate is good enough. but might be possible there is some memory leak? I am beginner in rust and not able to judge this question now.
My friend I understand your concern, you are an engineering meticulous person.
The crate has not crashed or leaked in our production environment, and the actual consumption is very low, the mem metrics have not exceeded 0.3% in almost 10 months of deployment.
Also, I personally have an AWS server with 1 core and 1 G, which has been deployed for 3 months (never restarted) without any abnormalities.
For 'memory leaks', it may feel that way looking through the console tool (but in reality the console metric may be the sum of historical memory requests and is not an accurate value), but the actual physical memory has always been around 50M.
Here is my local run, 1000 tasks running for 4 hours, the actual memory consumption has been cycling between 50M - 80M.
I monit process with script
#please run this monit script in tester dir
pid=`ps -ef|grep dt_test|grep -v grep|awk '{print $2}'`
#clean last test result
>mem.log
while true;do t=`date "+%Y-%m-%d %H:%M:%S"`&&m=`ps o rss -p $pid|tail -1`;echo "$t | $m";sleep 5;done>>mem.log
in 5.8 hours running,memory usage like this with 1000 jobs,2 hours later,the memory usage is tending towards stability,but is still a little grown.my computer has 32G memory, os is
Linux test 5.9.16-1-MANJARO #1 SMP PREEMPT Mon Dec 21 22:00:46 UTC 2020 x86_64 GNU/Linux
my test project is at https://github.com/elderbig/delay_timer-test
Thank you for the monitoring script, it looks very subtle.
May I ask, did you compile the program using debug
mode?
If it is convenient, please use release
to compile and test it (if there are results I would like to see the report too), I will keep an eye on your issue and keep thinking how to optimize it.
ps: (debug
program generated code optimization efficiency is not enough to run slow will have this happen)
Thank you for the monitoring script, it looks very subtle.
May I ask, did you compile the program using
debug
mode?If it is convenient, please use
release
to compile and test it (if there are results I would like to see the report too), I will keep an eye on your issue and keep thinking how to optimize it.ps: (
debug
program generated code optimization efficiency is not enough to run slow will have this happen)
yes,the test result I provided did run use release mode:
cargo run --release
I have a suggestion for memory optimization memory.
Set the timeout reasonably so that the timeout is close to the usage time.
For example, if you change the 10 in the code (set_maximum_running_time(2)
) to 2, the memory consumption will probably be reduced by half.
This is because the timeout tracker
maintains a task state of 1000 * seconds to execute
(although it is also constantly freeing, executing every second will increase the space occupation.)
You can try what I suggested to you first, it will pay off a lot.
In the meantime, I can change the architecture here, but it needs to be designed properly, so my friend you can watch
the project and you will be notified when I release it (it may take a while).
thx,I will try write my application with the crate,at the same time,I do not use too many jobs,and cut down frequence,I think it wil be well,and I will test as soon it updated;1000 is a ideal target,but every second is a harsh condition,if this is done,that will be a great works.
thx,I will try write my application with the crate,at the same time,I do not use too many jobs,and cut down frequence,I think it wil be well,and I will test as soon it updated;1000 is a ideal target,but every second is a harsh condition,if this is done,that will be a great works.
Thank you, I'll keep an eye on your experience and plan changes and iterations of the project!
Describe the bug
I config 100 jobs in a configuration file
conf/config.toml
,and init them in a loop,sometimes,it is remaining some jobs can not be init.output like:then,it do nothing for a long time,some remainding jobs not appeared any more.I try tokio, there is the same problem.