eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.

Other

3.27k stars 721 forks source link

Portable SCC: Compressed Refs #7965

Open dsouzai opened 4 years ago

dsouzai commented 4 years ago

There are two approaches that were brought up in the Portable SCC discussion regarding how to deal with the potential for the compressed refs shift changing with the heap size.

Have the JIT assume that the compressed shift might be 4. The generated code then loads the shift value into a register. This load can then be relocated.
Fix the shift value to 3 if the JVM is going to use AOT code.

harryyu1994 commented 4 years ago

SPECjbb2015 on zLinux

-Xmx3200m -Xms3200m -Xmn1200m


Shift0
RUN RESULT: hbIR (max attempted) = 13837, hbIR (settled) = 11548, max-jOPS = 10516, critical-jOPS = 2144
RUN RESULT: hbIR (max attempted) = 12302, hbIR (settled) = 11859, max-jOPS = 10088, critical-jOPS = 2224
RUN RESULT: hbIR (max attempted) = 13837, hbIR (settled) = 11548, max-jOPS = 10793, critical-jOPS = 2135
RUN RESULT: hbIR (max attempted) = 13837, hbIR (settled) = 11548, max-jOPS = 10931, critical-jOPS = 2154

max-jOPS = 10582 critical-jOPS = 2164.25

Shift3 RUN RESULT: hbIR (max attempted) = 12302, hbIR (settled) = 11859, max-jOPS = 10703, critical-jOPS = 2161 RUN RESULT: hbIR (max attempted) = 11859, hbIR (settled) = 11489, max-jOPS = 10673, critical-jOPS = 2080 RUN RESULT: hbIR (max attempted) = 11548, hbIR (settled) = 11430, max-jOPS = 10278, critical-jOPS = 2129 RUN RESULT: hbIR (max attempted) = 11548, hbIR (settled) = 10959, max-jOPS = 10278, critical-jOPS = 2071

max-jOPS = 10483 critical-jOPS = 2110.25



- 1% drop in max-jOPS
- 2.5% drop in critical-jOPS

harryyu1994 commented 4 years ago

SPECjbb2005 on Power

-Xmx3200m -Xms3200m -Xmn2600m -Xjit:scratchSpaceLimit=2048000,acceptHugeMethods -Xgcpolicy:gencon -Xcompressedrefs -XXgc:forcedShiftingCompressionAmount=0

Shift 0

SPECjbb2005 bops = 72214 SPECjbb2005 bops = 72204 SPECjbb2005 bops = 72304 SPECjbb2005 bops = 72690 SPECjbb2005 bops = 71765 Average bops = 72235.4

Shift 3

SPECjbb2005 bops = 69579 SPECjbb2005 bops = 69860 SPECjbb2005 bops = 70412 SPECjbb2005 bops = 70350 SPECjbb2005 bops = 69376 Average bops = 69915.4

3.3% throughput drop

Summary

1% throughput drop on x86
1-2% throughput drop on Z
3% throughput drop on Power

FYI @vijaysun-omr @zl-wang @mpirvu

mpirvu commented 4 years ago

@harryyu1994 could you please make a summary of all the experiments that were tried? It seems that only Power sees more than 2% regression from the move to shift3

zl-wang commented 4 years ago

it is within expectation. i remembered the overhead was about 2.5% previously we did the experiments.

harryyu1994 commented 4 years ago

Shift 0 vs. Shift 3 Summary

X86

Daytrader7

1% throughput drop
Shift 0: Throughput = 2850.45
Shift 3: Throughput = 2819.82

Marius' Daytrader7 Experiment

2% throughput drop
Shift 0: Throughput = 3455.40
Shift 3: Throughput = 3530.25

AcmeAir in Docker

no throughput drop observed
Shift 0: Throughput = 5231.12
Shift 3: Throughput = 5225.64

Quarkus+CRUD

0.9% throughput drop
Shift 0: Throughput = 12140.48
Shift 3: Throughput = 12040.20

Specjbb2015

no throughput drop observed
Shift 0: max_jOPS = 20163.5, critical_jOPS = 11766.5
Shift 3: max_jOPS = 20496.75, critical_jOPS = 11930.5
Made it look like we have a throughput gain for shift 3 but it's just fluctuation

Z

SPECjbb2015

1% max-jOPS drop
2.5% critical-jOPS drop
Shift 0: max-jOPS = 10582, critical-jOPS = 2164.25
Shift 3: max-jOPS = 10483, critical-jOPS = 2110.25

Power

ILOG

3.5% throughput drop
Shift 0: Global Throughput = 8604.6
Shift 3: Global Throughput = 8302.6

SPECjbb2005

3.3% throughput drop
Shift 0: Global Throughput = 72235.4
Shift 3: Global Throughput = 69915.4

@vijaysun-omr I have all the results listed here, will be waiting for your final call on this.

vijaysun-omr commented 4 years ago

@zl-wang I am worried by the high throughput loss on Power still (in excess of 3%). I don't know if you can afford to slow down everything 3+% inside OpenShift on Power. While I agree we used to have an overhead of approximately 2-3% on all platforms previously due to the shift, we now find that the overhead on other platforms (X86 has more data shown than Z) is lower. Can you please try the same on your Open Liberty setup ?

zl-wang commented 4 years ago

@vijaysun-omr i will give DT7/OpenLiberty a spin next as I talked to @harryyu1994

zl-wang commented 4 years ago

shift0 average throughput: 2798/s shift3 average throughput: 2757/s The gap is about 1.5%.

However, the up-down in the same run could be as big as 3-4%. haven't investigated why it is not as stable as my older driver: this one is July 29 build on Adopt site, as harry suggested a recent build.

vijaysun-omr commented 4 years ago

Can we try to get a Daytrader7 run done on Z as well so that we have more than just that one data point ?

@zl-wang fluctuation of 3-4% is high enough that we don't know if the overhead is in the 3% range on Power in this case as well. Ideally we should try to understand what is different on Power before going ahead but I am okay with delivering the change to make things portable wrt compressed refs with the general approach taken in this design first and then work out how to make the situation better on Power as a continuing effort past that initial delivery.