crossbeam-rs / crossbeam

Tools for concurrent programming in Rust
Apache License 2.0
7.33k stars 460 forks source link

Alternative dedicated GC thread scheme #287

Open Pslydhh opened 5 years ago

Pslydhh commented 5 years ago

As statement in Dedicated GC threads. I offer an alternative dedicated GC thread scheme based on 0.5.0‘s codes. Its' advantage:

So the way we work now is:

The GC Thread's work could be express in source code:

loop {
    epoch = epoch + 1;
    let block_id = receiver.recv().unwrap();

    // add one to the global epoch
    collector.global.epoch.store_epoch(epoch, Ordering::Release);
    let guard = pin_for_dedicate(Some(&handle));

    // if block_id > max_block_id: new slice could be reclaimed at future.
    if block_id > max_block_id {

        // means if all threads' epoch reach/will reach epoch, the slice(.., block_id) of queue
        // blocks could be reclaimed
        array.push( (Cell::new(block_id - max_block_id), epoch) );
        max_block_id = block_id;
    }

    // for reclaim
    if array.len() > 0 && (epoch - array[0].1) >= EPOCH_EVOLUTION{

        // try to wait all threads reach epoch. 
        // this must be fast, because the epoch(array[0].1) has been a long time.....
        collector.global.wait_until_epoch(array[0].1, &guard);
        let nums = array[0].0.get();
        for _ in 0..nums {
             collector.global.drop_bags_per_block(&guard);
        }
        let _ = array.remove(0);
    }
}
  1. EPOCH_EVOLUTION: how many epochs have passed for recycling? just a suitable value waits for the application threads to arrive array[0].1.

  2. wait_until_epoch: just wait application threads to arrive array[0].1, this is a very small time-consumption due to EPOCH_EVOLUTION's value, actually EPOCH_EVOLUTION's doesn't have to be big.

  3. drop_bags_per_block: reclaim all objects in the following nums blocks and the block itself, one by one.

  4. Finally remove this array in local vec through array.remove(0), And add epoch, then recv at mpsc again.

snippets for benchmark


As statement in Different behaviors of allocator stable/beta/nightly, this runs in stable version due to beta/nightly version not return memory to OS immediatly.

linux, 2cores.

I use one snippet to execute, compare outputs of this GC thread scheme and original epoch scheme.

fn main() {
    const STEPS: usize  = 10000000;
    const NTHREAD: usize = 4;
    let q = crossbeam::queue::MsQueue::new();
    thread::sleep(std::time::Duration::from_millis(10));
    for j in 0..10 {
        println!("time: {:?}", j);
        let now = std::time::Instant::now();
        let _ = crossbeam::scope(|s| {
            for _ in 0..NTHREAD {
                s.spawn(|_| {
                    for _ in 0..STEPS / NTHREAD {
                        let _ = q.push(1);
                        assert_eq!(q.try_pop().unwrap(), 1);
                    }
                });
            }
        });

        let elapsed = now.elapsed();
        println!("send duration: {} secs {} nanosecs",elapsed.as_secs(), elapsed.subsec_nanos());
        thread::sleep(std::time::Duration::from_millis(2000));
    }
    thread::sleep(std::time::Duration::from_millis(10000));
}

const NTHREAD: usize = 4;

outputs(Prefix Supplement 0): GC thread scheme ||||||||||||||||||||||||||||||||||||||||||| original epoch scheme 1 secs 143471821 nanosecs |||||||||||||||||||||||||| 1 secs 210276799 nanosecs 1 secs 124593797 nanosecs |||||||||||||||||||||||||| 1 secs 232127935 nanosecs 1 secs 138863934 nanosecs |||||||||||||||||||||||||| 1 secs 219543803 nanosecs 1 secs 110814380 nanosecs |||||||||||||||||||||||||| 1 secs 220393384 nanosecs 1 secs 119773895 nanosecs |||||||||||||||||||||||||| 1 secs 214081081 nanosecs 1 secs 161403600 nanosecs |||||||||||||||||||||||||| 1 secs 234918766 nanosecs 1 secs 109857831 nanosecs |||||||||||||||||||||||||| 1 secs 215446037 nanosecs 1 secs 094160905 nanosecs |||||||||||||||||||||||||| 1 secs 213825132 nanosecs 1 secs 106003148 nanosecs |||||||||||||||||||||||||| 1 secs 226929392 nanosecs 1 secs 099695684 nanosecs |||||||||||||||||||||||||| 1 secs 209825189 nanosecs ########

const NTHREAD: usize = 10;

outputs(Prefix Supplement 0): GC thread scheme ||||||||||||||||||||||||||||||||||||||||||| original epoch scheme 1 secs 140306864 nanosecs |||||||||||||||||||||||||| 1 secs 279806190 nanosecs 1 secs 056207640 nanosecs |||||||||||||||||||||||||| 1 secs 246947911 nanosecs 1 secs 070162686 nanosecs |||||||||||||||||||||||||| 1 secs 259408883 nanosecs 1 secs 088991397 nanosecs |||||||||||||||||||||||||| 1 secs 272075887 nanosecs 1 secs 055713392 nanosecs |||||||||||||||||||||||||| 1 secs 246470842 nanosecs 1 secs 068479571 nanosecs |||||||||||||||||||||||||| 1 secs 230906565 nanosecs 1 secs 073818380 nanosecs |||||||||||||||||||||||||| 1 secs 269090146 nanosecs 1 secs 079053992 nanosecs |||||||||||||||||||||||||| 1 secs 254569325 nanosecs 1 secs 073107757 nanosecs |||||||||||||||||||||||||| 1 secs 267058174 nanosecs 1 secs 065985661 nanosecs |||||||||||||||||||||||||| 1 secs 268059947 nanosecs ########

const NTHREAD: usize = 100;

outputs(Prefix Supplement 0): GC thread scheme ||||||||||||||||||||||||||||||||||||||||||| original epoch scheme 1 secs 193185376 nanosecs |||||||||||||||||||||||||| 1 secs 418499978 nanosecs 1 secs 041048913 nanosecs |||||||||||||||||||||||||| 1 secs 321932546 nanosecs 1 secs 044606536 nanosecs |||||||||||||||||||||||||| 1 secs 297366212 nanosecs 1 secs 050911009 nanosecs |||||||||||||||||||||||||| 1 secs 323293859 nanosecs 1 secs 034139942 nanosecs |||||||||||||||||||||||||| 1 secs 333284334 nanosecs 1 secs 049643055 nanosecs |||||||||||||||||||||||||| 1 secs 496548199 nanosecs 1 secs 044088381 nanosecs |||||||||||||||||||||||||| 1 secs 374482777 nanosecs 1 secs 065162764 nanosecs |||||||||||||||||||||||||| 1 secs 432493839 nanosecs 1 secs 057817585 nanosecs |||||||||||||||||||||||||| 1 secs 373340920 nanosecs 1 secs 051760231 nanosecs |||||||||||||||||||||||||| 1 secs 307466692 nanosecs ########

const NTHREAD: usize = 1000;

outputs(Prefix Supplement 0): GC thread scheme ||||||||||||||||||||||||||||||||||||||||||| original epoch scheme 1 secs 517525021 nanosecs |||||||||||||||||||||||||| 1 secs 685353423 nanosecs 1 secs 345287195 nanosecs |||||||||||||||||||||||||| 1 secs 567536351 nanosecs 1 secs 377172681 nanosecs |||||||||||||||||||||||||| 1 secs 437055733 nanosecs 1 secs 371699550 nanosecs |||||||||||||||||||||||||| 1 secs 393955945 nanosecs 1 secs 353762133 nanosecs |||||||||||||||||||||||||| 1 secs 482581825 nanosecs 1 secs 366737519 nanosecs |||||||||||||||||||||||||| 1 secs 393055953 nanosecs 1 secs 397711993 nanosecs |||||||||||||||||||||||||| 1 secs 482935481 nanosecs 1 secs 384174677 nanosecs |||||||||||||||||||||||||| 1 secs 458244472 nanosecs 1 secs 358561118 nanosecs |||||||||||||||||||||||||| 1 secs 456169802 nanosecs 1 secs 410261864 nanosecs |||||||||||||||||||||||||| 1 secs 456969812 nanosecs ########

ghost commented 5 years ago

Sorry for the late reply and thanks for doing interesting research! :)

I'm pleasantly surprised by the performance improvements of this GC scheme. My intuition was that a dedicated GC thread would necessarily be slower because:

I'm still somewhat skeptical of this approach and believe the benchmark results indicate that the current GC has a lot of room for improvement rather than a dedicated GC thread is the better approach. :)

Besides total time spent by the benchmark, it would also be interesting to see peak memory usage (max RSS). Do you perhaps have any results relating to memory usage?