eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

9-10% low Throughput with OpenJ9 compared to OpenJDK-Hotspot on Container based AcmeAir monolithic benchmark #6602

Open kusumachalasani opened 5 years ago

kusumachalasani commented 5 years ago

Low Throughput is observed with AcmeAir when running with containers with OpenJ9 when compared against OpenJDK. This behaviour is observed in both JDK8 and JDK11.

Did different set of runs with different options. But still see the regression with OpenJ9.

Below are the % scores of OpenJDK8-OpenJ9 vs OpenJDK8

OpenJDK8-OpenJ9 Vs OpenJ9

Instances % Throughput %jMEM_max %jMEM_last5min
1 98.79% 78.61% 120.81%
2 90.79% 84.61% 135.13%
3 91.23% 90.32% 125.90%
4 89.05% 86.23% 128.35%
5 91.28% 85.45% 130.53%
6 91.79% 83.49% 133.71%

Below is the individual data:

Openjdk8        
Instances Throughput CPU_avg jMEM_max jMEM_avg jMEM_last5min
1 4867 69.87 316 302.51 309.32
2 4993 79.83 632 605.91 618.47
3 4687 81.69 933 898.8 894
4 4584 81.86 1209 1181.67 1194.71
5 4451 81.69 1539 1493.8 1488.03
6 4336 81.52 1836 1779.43 1833.91
OpenJDK8-OpenJ9        
Instances Throughput CPU_avg jMEM_max jMEM_avg jMEM_last5min
1 4808 75.59 402 226.26 256.04
2 4533 81.09 747 452.46 457.7
3 4276 82.72 1033 687.25 710.11
4 4082 81.92 1402 915.64 930.85
5 4063 82.07 1801 1137.13 1139.97
6 3980 82.57 2199 1370.65 1371.56

Steps to run the benchmark:

Uses jmeter to drive the load. Uploaded the jmx file which we use. AcmeAir.zip

dsouzai commented 5 years ago

Create AcmeAir-liberty docker image.

I'm guessing you mean https://github.com/blueperf/acmeair-monolithic-java? Or do you mean https://github.com/sabkrish/acmeair.git?

AcmeAir benchmark used : -b microservice_changes https://github.com/sabkrish/acmeair.git

How does one run this benchmark?

Perhaps to make things easier, could you generate the two images and put it up somewhere, say Docker Hub, so that whoever takes a look at this issue doesn't have to get bogged down in the build step (which looks non-trivial).

kusumachalasani commented 5 years ago

I pushed the two images i have used. Both Openj9 and hotspot for JDK8.

OpenJ9 : docker pull kusumach/acmeairopenj9-18002-11-nolt-liberty1 OpenJDK-Hotspot: docker pull kusumach/acmeairopenjdk-18002-latestsdk-nolt-liberty1

These images can be used to run the AcmeAir application.

mpirvu commented 5 years ago

If the experiments were done on Ubuntu, changing the transparent huge pages setting from [madvise] to [always] may provide a throughput boost to OpenJ9. I have seen a 10% improvement from such a change.

kusumachalasani commented 5 years ago

@mpirvu
As the experiments I'm using are on Ubuntu, I enabled THP using echo always >/sys/kernel/mm/transparent_hugepage/enabled Even after enabling THP, I still see a drop in Throughput with OpenJ9. With 1 instance alone, I see there is ~9% drop ; and for multi instances it is ~1.7-5.7% drop.

OpenJ9 vs OpenJDK(THP enabled) Instances %Throughput %jMEM_max %jMEM_avg %jMEM_last5min
  1 91.90% 105.75% 135.63% 133.98%
  2 98.32% 117.86% 138.75% 137.83%
  3 97.23% 112.92% 134.54% 134.56%
  4 94.32% 112.03% 132.79% 131.10%
mpirvu commented 5 years ago

The initial data showed a gap of 98.79% with 1 instance. After enabling THP the gap widened to 91.90%. Was the absolute performance lower for OpenJ9 when THP was enabled?

mpirvu commented 5 years ago

I had some runs outside docker with a single instance of AcmeAir using "-Xms1G -Xmx1G" and adding -Xshareclasses:none for OpenJ9. The throughput gap is about 5% in this configuration. THP was enabled on the machine and the JVM process was pinned to 4 HW threads (2 cores with their hyperthreding counterpart).

HotSpot
Results for JDK=/home/mpirvu/sdks/OpenJDK8U-jdk_x64_linux_hotspot_2019-06-27-20-02 jvmOpts=-Xms1G -Xmx1G
Throughput      avg=18527.08    min=18070.50    max=19041.40    stdDev=396.2    maxVar=5.37%    confInt=2.04%   samples= 5
Footprint=671392 KB
Footprint       avg=675976.00   min=663260.00   max=688180.00   stdDev=9771.9   maxVar=3.76%    confInt=1.38%   samples= 5

OpenJ9
Results for JDK=/home/mpirvu/sdks/OpenJDK8U-jdk_x64_linux_openj9_8u222b10_openj9-0.15.1 jvmOpts=-Xshareclasses:none -Xms1G -Xmx1G
Throughput      avg=17540.60    min=17305.20    max=17732.90    stdDev=180.9    maxVar=2.47%    confInt=0.98%   samples= 5
CompTime        avg=29494.20    min=28191.00    max=31388.00    stdDev=1472.2   maxVar=11.34%   confInt=4.76%   samples= 5
Footprint       avg=433427.20   min=429112.00   max=439352.00   stdDev=3735.6   maxVar=2.39%    confInt=0.82%   samples= 5

Attn: @andrewcraik

andrewcraik commented 5 years ago

Has anyone tried this with head - eg the nightly? Further is this a regression compared to a previous OpenJ9 build or just a gap we should be addressing? perf records for both VMs would be helpful in trying to find the cause of the gap or the opportunity to improve...

mpirvu commented 5 years ago

Further is this a regression compared to a previous OpenJ9 build or just a gap we should be addressing

As far as I know a throughput gap for AcmeAir existed for a long time. I don't think this is a new regression.

andrewcraik commented 5 years ago

ok - this is an enhancement more than a defect then - eg we have an opportunity to improve. I just affects the relative priority of the issue compared to regressions. Thanks!

kusumachalasani commented 5 years ago

The initial data showed a gap of 98.79% with 1 instance. After enabling THP the gap widened to 91.90%. Was the absolute performance lower for OpenJ9 when THP was enabled?

Apologies for the late response. All the machines went down to collect the results. I did some re-runs with 1 instance, and I see with THP enabled, the gap didn't increase. With THP and no THP OpenJ9 is ~4-5% lower than OpenJDK.

The main description results are with config, using all 4 processors in a machine for all instances in a 4C-4GB machine.

Sharing new set of results, with following config: (Attached the xls) Machine Config: 4C-4GB Using any 2 cpus for every instance 100 users , 900 secs , -memlimit=300MB , no heap settings. issue-6602.xlsx

kusumachalasani commented 5 years ago

@mpirvu Collected the profiles using 'perf' tool as tprof was not supported on that machine. Below are the text format output files (by doing perf report) collected using perf record. txtfiles.zip

The data files are also available, but they are of large size ~1GB. If you need them, please let me know at what location I can place.

previousdeveloper commented 4 years ago

+1