Closed ghost closed 6 years ago
@joshlf Could you please take a look at the Accessing the default handle section, since it is related to elfmalloc? You've been dealing with thread-locals and tuning performance a lot, so your opinion would be very helpful here.
@jeehoonkang is advocating for the 1st option, while I want the 3rd. But in the end, I don't feel particularly strongly about it.
However, there is one thing I do strongly care about: let's avoid unsafe APIs whenever possible. In that spirit, the 2nd option would be better than the 1st one.
@jeehoonkang
I appreciate your concerns about the performance of the 3rd option. However, I'd like put some things into perspective first. Take a look at the following benchmarks:
#[bench]
fn pin_empty(b: &mut Bencher) {
b.iter(|| epoch::pin());
}
#[bench]
fn default_handle_pin(b: &mut Bencher) {
b.iter(|| epoch::default_handle().pin());
}
#[bench]
fn local_handle_pin(b: &mut Bencher) {
let handle = epoch::default_handle();
b.iter(|| handle.pin());
}
#[bench]
fn thread_local_handle_pin(b: &mut Bencher) {
thread_local! {
static MY: Handle = epoch::default_handle();
}
b.iter(|| MY.with(|handle| handle.pin()));
}
Results:
test default_handle_pin ... bench: 13 ns/iter (+/- 1)
test local_handle_pin ... bench: 9 ns/iter (+/- 1)
test pin_empty ... bench: 12 ns/iter (+/- 1)
test thread_local_handle_pin ... bench: 9 ns/iter (+/- 0)
Looks like pinning itself costs 9 nanoseconds. Accessing the thread-local (that's LOCAL.with(...)
inside epoch::pin()
) adds 3 more nanoseconds. Finally, incrementing/decrementing the reference count to access the default handle costs 1 more nanosecond.
However, if you store the handle in an another thread-local (in this example, MY
) and access it with MY.with(...)
, you don't pay the cost of reference counting. This is also faster than epoch::pin()
, perhaps due to better inlining opportunities (?).
The point is, that 1 nanosecond paid on reference counting really doesn't matter at all. If you need maximum performance, it's easy to have it by creating a custom thread-local anyway.
@stjepang thanks for the kind explanation. Now I prefer the 3rd option. Let's go for it :)
For safety, I incorrectly assumed that both 1st and 3rd options invoke undefined behavior. However, the 3rd option seems to raise a specific panic: HANDLE
is already destroyed. It's arguably less unsafe, I think.
Rendered
Implementation
Closes #19.