Closed heshanpadmasiri closed 1 year ago
@jclark it seems llvm still does not support trampolines with Aarch64 https://github.com/llvm/llvm-project/issues/56625. I managed to replicate the same issue with our implementation as well.
It seems as of now llvm support trampolines only on x86
, PowerPC
and XCore
(ref)
Based on my benchmarking it seems trampolines are actually slower than checking the last bit of tagged ptr at runtime. Therefore decided to remove them.
captureBench-v.bal
Time (mean ± σ): 735.4 ms ± 22.5 ms [User: 541.8 ms, System: 193.4 ms]
Range (min … max): 711.8 ms … 793.3 ms 10 runs
Time (mean ± σ): 902.3 ms ± 5.6 ms [User: 602.7 ms, System: 299.5 ms]
Range (min … max): 887.8 ms … 908.8 ms 10 runs
I believe this is because version with the trampoline actually ends up with more memory access operations and cache misses
I refs: 4,947,763,824
I1 misses: 1,252
LLi misses: 1,243
I1 miss rate: 0.00%
LLi miss rate: 0.00%
D refs: 1,731,370,229 (1,053,215,632 rd + 678,154,597 wr)
D1 misses: 20,988,913 ( 7,995,956 rd + 12,992,957 wr)
LLd misses: 20,664,350 ( 7,673,448 rd + 12,990,902 wr)
D1 miss rate: 1.2% ( 0.8% + 1.9% )
LLd miss rate: 1.2% ( 0.7% + 1.9% )
LL refs: 20,990,165 ( 7,997,208 rd + 12,992,957 wr)
LL misses: 20,665,593 ( 7,674,691 rd + 12,990,902 wr)
LL miss rate: 0.3% ( 0.1% + 1.9% )
I refs: 5,124,578,269
I1 misses: 3,145,894
LLi misses: 3,138,589
I1 miss rate: 0.06%
LLi miss rate: 0.06%
D refs: 1,817,007,881 (1,083,559,032 rd + 733,448,849 wr)
D1 misses: 24,126,487 ( 7,996,260 rd + 16,130,227 wr)
LLd misses: 23,893,644 ( 7,765,534 rd + 16,128,110 wr)
D1 miss rate: 1.3% ( 0.7% + 2.2% )
LLd miss rate: 1.3% ( 0.7% + 2.2% )
LL refs: 27,272,381 ( 11,142,154 rd + 16,130,227 wr)
LL misses: 27,032,233 ( 10,904,123 rd + 16,128,110 wr)
LL miss rate: 0.4% ( 0.2% + 2.2% )
CaptureBench2-v.bal
Time (mean ± σ): 428.4 ms ± 7.9 ms [User: 302.4 ms, System: 126.0 ms]
Range (min … max): 418.8 ms … 443.5 ms 10 runs
Time (mean ± σ): 1.093 s ± 0.041 s [User: 0.884 s, System: 0.209 s]
Range (min … max): 1.025 s … 1.160 s 10 runs
I refs: 3,340,382,936
I1 misses: 1,185
LLi misses: 1,176
I1 miss rate: 0.00%
LLi miss rate: 0.00%
D refs: 1,185,121,825 (665,084,908 rd + 520,036,917 wr)
D1 misses: 8,759,477 ( 6,694 rd + 8,752,783 wr)
LLd misses: 8,753,431 ( 2,472 rd + 8,750,959 wr)
D1 miss rate: 0.7% ( 0.0% + 1.7% )
LLd miss rate: 0.7% ( 0.0% + 1.7% )
LL refs: 8,760,662 ( 7,879 rd + 8,752,783 wr)
LL misses: 8,754,607 ( 3,648 rd + 8,750,959 wr)
LL miss rate: 0.2% ( 0.0% + 1.7% )
I refs: 3,487,197,383
I1 misses: 3,151,065
LLi misses: 1,238
I1 miss rate: 0.09%
LLi miss rate: 0.00%
D refs: 1,270,759,477 (695,428,308 rd + 575,331,169 wr)
D1 misses: 11,896,171 ( 6,131 rd + 11,890,040 wr)
LLd misses: 11,890,638 ( 2,475 rd + 11,888,163 wr)
D1 miss rate: 0.9% ( 0.0% + 2.1% )
LLd miss rate: 0.9% ( 0.0% + 2.1% )
LL refs: 15,047,236 ( 3,157,196 rd + 11,890,040 wr)
LL misses: 11,891,876 ( 3,713 rd + 11,888,163 wr)
LL miss rate: 0.2% ( 0.0% + 2.1% )
Need to test === operator Need to test casting (including intersections) Consistent terminology: capture vs lambda vs closure
Need to test === operator Need to test casting (including intersections) Consistent terminology: capture vs lambda vs closure
Fixed
Purpose
Implement closures by extending #1223 Resolves #1225
Goals
Approach
User stories
Release note
Documentation
Training
Certification
Marketing
Automation tests
Security checks
Samples
Related PRs
Migrations (if applicable)
Test environment
Learning