BinChengZhao / delay-timer

Time-manager of delayed tasks. Like crontab, but synchronous asynchronous tasks are possible scheduling, and dynamic add/cancel/remove is supported.

Apache License 2.0

309 stars 23 forks source link

Many jobs sometimes #28

Open elderbig opened 2 years ago

elderbig commented 2 years ago

Describe the bug

#[async_std::main]
async fn main() -> Result<()> {
    let conf = reader::read_config::read("conf/config.toml");
    log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap();
    let delay_timer = DelayTimerBuilder::default().build();
    let mut i:u64 = 0;
    for s in conf.service.unwrap() {
        i = i + 1;
        //If a task_id is used,new task will be ignore by insert_task fn.
        let begin_time = Local::now().naive_local();
        let shell_task_instance_chain = delay_timer.
            insert_task(build_task_async_execute_process(i,&s)?)?;
        let _ = shell_task_instance_chain
            .next_with_async_wait()
            .await?
            .cancel_with_async_wait()
            .await?;
        let end_time = Local::now().naive_local();
        info!("use [{0}] by task [{1}] init", (end_time - begin_time).num_seconds(), i);
    }

    info!("All job inited");
    for _ in 0..180 {
        thread::sleep(Duration::from_secs(60));
    }
    // No new tasks are accepted; running tasks are not affected.
    Ok(delay_timer.stop_delay_timer()?)
}

I config 100 jobs in a configuration file conf/config.toml,and init them in a loop,sometimes,it is remaining some jobs can not be init.output like:

2021-11-17T19:06:56.221273177+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [1] init
2021-11-17T19:06:58.221268818+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [2] init
2021-11-17T19:06:56.221273177+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [3] init
2021-11-17T19:06:58.221268818+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [4] init
....
....
2021-11-17T19:07:00.221238386+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [59] init
2021-11-17T19:07:02.221417702+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [60] init
2021-11-17T19:07:04.221640690+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [61] init
2021-11-17T19:07:06.221278580+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [62] init
2021-11-17T19:07:08.221419039+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [63] init
2021-11-17T19:07:10.221318266+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [64] init

then,it do nothing for a long time,some remainding jobs not appeared any more.I try tokio, there is the same problem.

BinChengZhao commented 2 years ago

Describe the bug

#[async_std::main]
async fn main() -> Result<()> {
    let conf = reader::read_config::read("conf/config.toml");
    log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap();
    let delay_timer = DelayTimerBuilder::default().build();
    let mut i:u64 = 0;
    for s in conf.service.unwrap() {
        i = i + 1;
        //If a task_id is used,new task will be ignore by insert_task fn.
        let begin_time = Local::now().naive_local();
        let shell_task_instance_chain = delay_timer.
            insert_task(build_task_async_execute_process(i,&s)?)?;
        let _ = shell_task_instance_chain
            .next_with_async_wait()
            .await?
            .cancel_with_async_wait()
            .await?;
        let end_time = Local::now().naive_local();
        info!("use [{0}] by task [{1}] init", (end_time - begin_time).num_seconds(), i);
    }

    info!("All job inited");
    for _ in 0..180 {
        thread::sleep(Duration::from_secs(60));
    }
    // No new tasks are accepted; running tasks are not affected.
    Ok(delay_timer.stop_delay_timer()?)
}

I config 100 jobs in a configuration file conf/config.toml,and init them in a loop,sometimes,it is remaining some jobs can not be init.output like:

2021-11-17T19:06:56.221273177+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [1] init
2021-11-17T19:06:58.221268818+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [2] init
2021-11-17T19:06:56.221273177+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [3] init
2021-11-17T19:06:58.221268818+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [4] init
....
....
2021-11-17T19:07:00.221238386+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [59] init
2021-11-17T19:07:02.221417702+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [60] init
2021-11-17T19:07:04.221640690+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [61] init
2021-11-17T19:07:06.221278580+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [62] init
2021-11-17T19:07:08.221419039+08:00 - INFO - src/main.rs -line=>39 - testcron - use [2] by task [63] init
2021-11-17T19:07:10.221318266+08:00 - INFO - src/main.rs -line=>39 - testcron - use [1] by task [64] init

then,it do nothing for a long time,some remainding jobs not appeared any more.I try tokio, there is the same problem.

@elderbig I'm here, my friend. Let me break it down for you

You are using two special apis insert_task

cancel_with_async_wait

insert_task means Add a task in timer_core by event-channel. But it will return a handle that can constantly take out new instances of the task.

cancel_with_async_wait means Cancel the currently running task instance and async-await it.

So you have a situation here where the task is cancelled as soon as it executes, so you don't see the output.

If you don't want to manipulate task instances in real time, just use add_task and avoid cancel_with_*.

elderbig commented 2 years ago

Use add_task,it is run fast more.like this:

let s = "echo 1".into();
delay_timer.add_task(build_task_async_execute_process(i,&s)?)?;

BinChengZhao commented 2 years ago

Use add_task,it is run fast more.like this:
let s = "echo 1".into();
delay_timer.add_task(build_task_async_execute_process(i,&s)?)?;

Yes, it is nanosecond level speed, hope it helps you.

elderbig commented 2 years ago

but I ment another problem,when I run 100 task,every task run 1 time in 1 second,when it run,the memory rise step by step, after about 2 hours,it rise from 8M upto 180M,if I use 1000 task, soon it will use 500M memory. I think the more task and longer time running,the more memory will be used,it may be some thing wrong

BinChengZhao commented 2 years ago

but I ment another problem,when I run 100 task,every task run 1 time in 1 second,when it run,the memory rise step by step, after about 2 hours,it rise from 8M upto 180M,if I use 1000 task, soon it will use 500M memory. I think the more task and longer time running,the more memory will be used,it may be some thing wrong

I understand what you mean, I have two questions

does memory usage refer to your rust program, or is it an assessment of your machine usage?
what does your task code template look like roughly?

You could try calling remove_task to clear most of the tasks after they have been running for many hours to see if the program shrinks significantly in memory.

elderbig commented 2 years ago

Thanks for your response,this is my code:

#[async_std::main]
async fn main() -> Result<()>{
    log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap();
    info!("begin");
    let delay_timer = DelayTimerBuilder::default().build();
    for i in 0..1000{
        delay_timer.add_task(build_task_async_execute_process(i)?)?;
        info!("init task id = [{}]", i);
    }
    info!("==== All job is be init! ====");
    for _ in 0..120 {
        thread::sleep(Duration::from_secs(60));
    }
    Ok(delay_timer.stop_delay_timer()?)
}

fn build_task_async_execute_process(task_id:u64) -> Result<Task, TaskError> {
    let mut task_builder = TaskBuilder::default();
    let body = unblock_process_task_fn("echo hello".into());
    task_builder
        .set_frequency_repeated_by_cron_str("* * * * * *".into())
        .set_task_id(task_id)
        .set_maximum_running_time(10)
        .set_maximum_parallel_runnable_num(1)
        .spawn(body)
}

this time, I run 1000 jobs, so I can see more memory usage.It run for 2 hours,and memory rise to 2.7G,how can I do? and what mean about remove_task? I want run the job again and again. thanks a lot!

BinChengZhao commented 2 years ago

Thanks for your response,this is my code:

#[async_std::main]
async fn main() -> Result<()>{
    log4rs::init_file("conf/log4rs.yaml", Default::default()).unwrap();
    info!("begin");
    let delay_timer = DelayTimerBuilder::default().build();
    for i in 0..1000{
        delay_timer.add_task(build_task_async_execute_process(i)?)?;
        info!("init task id = [{}]", i);
    }
    info!("==== All job is be init! ====");
    for _ in 0..120 {
        thread::sleep(Duration::from_secs(60));
    }
    Ok(delay_timer.stop_delay_timer()?)
}

fn build_task_async_execute_process(task_id:u64) -> Result<Task, TaskError> {
    let mut task_builder = TaskBuilder::default();
    let body = unblock_process_task_fn("echo hello".into());
    task_builder
        .set_frequency_repeated_by_cron_str("* * * * * *".into())
        .set_task_id(task_id)
        .set_maximum_running_time(10)
        .set_maximum_parallel_runnable_num(1)
        .spawn(body)
}

Thanks to your @elderbig demand, I made an optimization.

Version 0.10.0

Changed

Optimized the use of internal time-wheel memory. (#28), thanks elderbig @elderbig !

Details

There is a time-wheel in delay-timer, which is the carrier of all tasks.

The time wheel uses slots (time scales) as units, each slot corresponds to a hash table, when a slot is rotated to it will execute the task that is already ready internally, when the task is executed it will move from one slot to another. In order to have enough capacity to store the tasks, there may be a memory allocation here, so that by the time the whole time wheel is traversed, each internal time wheel-slot will have rich memory capacity, and when there are many tasks the memory occupied by the whole time wheel will be very large. So it will be necessary to shrink the memory in time.

This change is to shrink the memory in time after each round of training slots and executing tasks to ensure that the slots have a basic and compact capacity.

You can now use the latest version of delay-timer v0.10.0 and enjoy the memory optimized version.

elderbig commented 2 years ago

I test this version(0.10.0),the problem is reduced,in 2 hours test with 1000 jobs,memory used by my progress rise up from 8M to 319M,it seems come better,but memory is still rise slowly,Can it to be better? this my test project https://github.com/elderbig/delay_timer-test

elderbig commented 2 years ago

I test once more, in 5.8 hours,memory rise up to 650M, it seems memory rise up always when program running,bu increase become slowly by slowly,and cpu used in my 12cores computer,top like this:

task: 325 total,   1 running, 324 sleeping,   0 stopped,   0 zombie
%Cpu(s): 18.3 us, 10.1 sy,  0.0 ni, 71.0 id,  0.0 wa,  0.6 hi,  0.1 si,  0.0 st
MiB Mem :  31828.4 total,  27787.6 free,   1532.7 used,   2508.1 buff/cache
MiB Swap:  35013.6 total,  35013.6 free,      0.0 used.  29168.7 avail Mem 

PID        USER      PR  NI    VIRT    RES    SHR    %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                                                                                                                                                                                                                                                
 322014   test        20   0 1227580  18172   4340 S  14.3   0.1   0:22.49 dt_test                                                                                                                                                                                                                                                                                                                                                                                                                
    586    root      20   0  698356 120628  85836 S   2.3   0.4   0:25.92 Xorg                                                                                                                                                                                                                                                                                                                                                                                                                   
 374145   test        20   0   24.5g 168720 108788 S   2.0   0.5   0:07.79 chrome                                                                                                                                                                                                                                                                                                                                                                                                                 
   1827  test        20   0  691152  83580  57708 S   1.7   0.3   0:10.45 xfce4-terminal                                                                                                                                                                                                                                                                                                                                                                                                         
    783   root     -51   0       0      0      0 D   0.7   0.0   0:13.70 irq/65-i2c-INT3                                                                                                                                                                                                                                                                                                                                                                                                        
      9     root      20   0       0      0      0 S   0.3   0.0   0:00.87 ksoftirqd/0                                                                                                                                                                                                                                                                                                                                                                                                            
     21     root      20   0       0      0      0 S   0.3   0.0   0:00.92 ksoftirqd/1                                                                                                                                                                                                                                                                                                                                                                                                            
     28     root      20   0       0      0      0 S   0.3   0.0   0:00.89 ksoftirqd/2                                                                                                                                                                                                                                                                                                                                                                                                            
     42     root      20   0       0      0      0 S   0.3   0.0   0:00.90 ksoftirqd/4                                                                                                                                                                                                                                                                                                                                                                                                            
     63    root      20   0       0      0      0 S   0.3   0.0   0:00.88 ksoftirqd/7                                                                                                                                                                                                                                                                                                                                                                                                            
     77     root      20   0       0      0      0 S   0.3   0.0   0:00.83 ksoftirqd/9                                                                                                                                                                                                                                                                                                                                                                                                            
     84     root      20   0       0      0      0 S   0.3   0.0   0:00.91 ksoftirqd/10                                                                                                                                                                                                                                                                                                                                                                                                           
     91     root      20   0       0      0      0 S   0.3   0.0   0:00.94 ksoftirqd/11                                                                                                                                                                                                                                                                                                                                                                                                           
    245    root      20   0       0      0      0 I   0.3   0.0   0:00.22 kworker/9:2-events                                                                                                                                                                                                                                                                                                                                                                                                     
   1085  root      20   0 1928896  33700  26484 S   0.3   0.1   0:00.22 containerd                                                                                                                                                                                                                                                                                                                                                                                                             
   1350  test        20   0  161384   6688   5804 S   0.3   0.0   0:00.14 at-spi2-registr                                                                                                                                                                                                                                                                                                                                                                                                        
   2125  test        20   0   16.3g  99936  81416 S   0.3   0.3   0:04.00 chrome                                                                                                                                                                                                                                                                                                                                                                                                                 
  14455  test        20   0   15564   5860   3608 R   0.3   0.0   0:02.05 top

dt_test is the test pocess,and 100 jobs will cost 1.7% cpu total. test program https://github.com/elderbig/delay_timer-test

BinChengZhao commented 2 years ago

8

Hi friend, I noticed your feedback that the program generates more and more buffering during operation, and now it has got 75% - 80% memory optimization.

For after OS optimization, all remaining buffers should now be planned by OS rationally (physical memory, with swap area), as shown on your terminal only 0.1% of physical memory overhead.

Actually I can optimize it further if you need, but it may affect 10% of the performance of the internal scheduler, but it still guarantees tens of thousands of tasks per second to be scheduled.

elderbig commented 2 years ago

Thank for your reply.In my submission in fact, this crate is good enough. but might be possible there is some memory leak? I am beginner in rust and not able to judge this question now.

BinChengZhao commented 2 years ago

Thank for your reply.In my submission in fact, this crate is good enough. but might be possible there is some memory leak? I am beginner in rust and not able to judge this question now.

My friend I understand your concern, you are an engineering meticulous person.

The crate has not crashed or leaked in our production environment, and the actual consumption is very low, the mem metrics have not exceeded 0.3% in almost 10 months of deployment.

Also, I personally have an AWS server with 1 core and 1 G, which has been deployed for 3 months (never restarted) without any abnormalities.

For 'memory leaks', it may feel that way looking through the console tool (but in reality the console metric may be the sum of historical memory requests and is not an accurate value), but the actual physical memory has always been around 50M.

If you have any concerns and any exceptions you can always open issues and I will see that they will be addressed and dealt with in a timely manner.

Here is my local run, 1000 tasks running for 4 hours, the actual memory consumption has been cycling between 50M - 80M.

elderbig commented 2 years ago

I monit process with script

#please run this monit script in tester dir
pid=`ps -ef|grep dt_test|grep -v grep|awk '{print $2}'`
#clean last test result
>mem.log
while true;do t=`date "+%Y-%m-%d %H:%M:%S"`&&m=`ps o rss -p $pid|tail -1`;echo "$t | $m";sleep 5;done>>mem.log

in 5.8 hours running,memory usage like this memory_usage_delay_timer_0 10 0 with 1000 jobs,2 hours later,the memory usage is tending towards stability,but is still a little grown.my computer has 32G memory, os is

Linux test 5.9.16-1-MANJARO #1 SMP PREEMPT Mon Dec 21 22:00:46 UTC 2020 x86_64 GNU/Linux

my test project is at https://github.com/elderbig/delay_timer-test

BinChengZhao commented 2 years ago

Thank you for the monitoring script, it looks very subtle.

May I ask, did you compile the program using debug mode?

If it is convenient, please use release to compile and test it (if there are results I would like to see the report too), I will keep an eye on your issue and keep thinking how to optimize it.

ps: (debug program generated code optimization efficiency is not enough to run slow will have this happen)

elderbig commented 2 years ago

Thank you for the monitoring script, it looks very subtle.

May I ask, did you compile the program using debug mode?

If it is convenient, please use release to compile and test it (if there are results I would like to see the report too), I will keep an eye on your issue and keep thinking how to optimize it.

ps: (debug program generated code optimization efficiency is not enough to run slow will have this happen)

yes,the test result I provided did run use release mode: cargo run --release

BinChengZhao commented 2 years ago

https://github.com/elderbig/delay_timer-test

I have a suggestion for memory optimization memory.

Set the timeout reasonably so that the timeout is close to the usage time.

For example, if you change the 10 in the code (set_maximum_running_time(2)) to 2, the memory consumption will probably be reduced by half.

This is because the timeout tracker maintains a task state of 1000 * seconds to execute (although it is also constantly freeing, executing every second will increase the space occupation.)

You can try what I suggested to you first, it will pay off a lot. In the meantime, I can change the architecture here, but it needs to be designed properly, so my friend you can watch the project and you will be notified when I release it (it may take a while).

elderbig commented 2 years ago

thx,I will try write my application with the crate,at the same time,I do not use too many jobs,and cut down frequence,I think it wil be well,and I will test as soon it updated;1000 is a ideal target,but every second is a harsh condition,if this is done,that will be a great works.

BinChengZhao commented 2 years ago

thx,I will try write my application with the crate,at the same time,I do not use too many jobs,and cut down frequence,I think it wil be well,and I will test as soon it updated;1000 is a ideal target,but every second is a harsh condition,if this is done,that will be a great works.

Thank you, I'll keep an eye on your experience and plan changes and iterations of the project!