ballerina-platform / nballerina

Ballerina compiler that generates native executables.
https://ballerina.io/
Apache License 2.0
138 stars 46 forks source link

Implement closures for final values #1228

Closed heshanpadmasiri closed 1 year ago

heshanpadmasiri commented 1 year ago

Purpose

Implement closures by extending #1223 Resolves #1225

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI (email documentation@wso2.com to review all UI text). Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter “N/A” plus brief explanation of why there’s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type “Sent” when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to certification@wso2.com and NOT pasted in this PR. If there is no impact on certification exams, type “N/A” and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

heshanpadmasiri commented 1 year ago

@jclark it seems llvm still does not support trampolines with Aarch64 https://github.com/llvm/llvm-project/issues/56625. I managed to replicate the same issue with our implementation as well.

It seems as of now llvm support trampolines only on x86, PowerPC and XCore (ref)

heshanpadmasiri commented 1 year ago

Based on my benchmarking it seems trampolines are actually slower than checking the last bit of tagged ptr at runtime. Therefore decided to remove them.

captureBench-v.bal

Without trampoline

  Time (mean ± σ):     735.4 ms ±  22.5 ms    [User: 541.8 ms, System: 193.4 ms]
  Range (min … max):   711.8 ms … 793.3 ms    10 runs

With trampoline

  Time (mean ± σ):     902.3 ms ±   5.6 ms    [User: 602.7 ms, System: 299.5 ms]
  Range (min … max):   887.8 ms … 908.8 ms    10 runs

I believe this is because version with the trampoline actually ends up with more memory access operations and cache misses

Without trampoline

I   refs:      4,947,763,824
I1  misses:            1,252
LLi misses:            1,243
I1  miss rate:          0.00%
LLi miss rate:          0.00%

D   refs:      1,731,370,229  (1,053,215,632 rd   + 678,154,597 wr)
D1  misses:       20,988,913  (    7,995,956 rd   +  12,992,957 wr)
LLd misses:       20,664,350  (    7,673,448 rd   +  12,990,902 wr)
D1  miss rate:           1.2% (          0.8%     +         1.9%  )
LLd miss rate:           1.2% (          0.7%     +         1.9%  )

LL refs:          20,990,165  (    7,997,208 rd   +  12,992,957 wr)
LL misses:        20,665,593  (    7,674,691 rd   +  12,990,902 wr)
LL miss rate:            0.3% (          0.1%     +         1.9%  )

With trampoline

I   refs:      5,124,578,269
I1  misses:        3,145,894
LLi misses:        3,138,589
I1  miss rate:          0.06%
LLi miss rate:          0.06%

D   refs:      1,817,007,881  (1,083,559,032 rd   + 733,448,849 wr)
D1  misses:       24,126,487  (    7,996,260 rd   +  16,130,227 wr)
LLd misses:       23,893,644  (    7,765,534 rd   +  16,128,110 wr)
D1  miss rate:           1.3% (          0.7%     +         2.2%  )
LLd miss rate:           1.3% (          0.7%     +         2.2%  )

LL refs:          27,272,381  (   11,142,154 rd   +  16,130,227 wr)
LL misses:        27,032,233  (   10,904,123 rd   +  16,128,110 wr)
LL miss rate:            0.4% (          0.2%     +         2.2%  )

CaptureBench2-v.bal

Without trampoline

  Time (mean ± σ):     428.4 ms ±   7.9 ms    [User: 302.4 ms, System: 126.0 ms]
  Range (min … max):   418.8 ms … 443.5 ms    10 runs

With trampoline

  Time (mean ± σ):      1.093 s ±  0.041 s    [User: 0.884 s, System: 0.209 s]
  Range (min … max):    1.025 s …  1.160 s    10 runs

Cachegrind without trampoline

I   refs:      3,340,382,936
I1  misses:            1,185
LLi misses:            1,176
I1  miss rate:          0.00%
LLi miss rate:          0.00%

D   refs:      1,185,121,825  (665,084,908 rd   + 520,036,917 wr)
D1  misses:        8,759,477  (      6,694 rd   +   8,752,783 wr)
LLd misses:        8,753,431  (      2,472 rd   +   8,750,959 wr)
D1  miss rate:           0.7% (        0.0%     +         1.7%  )
LLd miss rate:           0.7% (        0.0%     +         1.7%  )

LL refs:           8,760,662  (      7,879 rd   +   8,752,783 wr)
LL misses:         8,754,607  (      3,648 rd   +   8,750,959 wr)
LL miss rate:            0.2% (        0.0%     +         1.7%  )

Cache grind with trampoline

I   refs:      3,487,197,383
I1  misses:        3,151,065
LLi misses:            1,238
I1  miss rate:          0.09%
LLi miss rate:          0.00%

D   refs:      1,270,759,477  (695,428,308 rd   + 575,331,169 wr)
D1  misses:       11,896,171  (      6,131 rd   +  11,890,040 wr)
LLd misses:       11,890,638  (      2,475 rd   +  11,888,163 wr)
D1  miss rate:           0.9% (        0.0%     +         2.1%  )
LLd miss rate:           0.9% (        0.0%     +         2.1%  )

LL refs:          15,047,236  (  3,157,196 rd   +  11,890,040 wr)
LL misses:        11,891,876  (      3,713 rd   +  11,888,163 wr)
LL miss rate:            0.2% (        0.0%     +         2.1%  )
jclark commented 1 year ago

Need to test === operator Need to test casting (including intersections) Consistent terminology: capture vs lambda vs closure

heshanpadmasiri commented 1 year ago

Need to test === operator Need to test casting (including intersections) Consistent terminology: capture vs lambda vs closure

Fixed