dragonwell-project / dragonwell11

Alibaba Dragonwell11 JDK
https://www.aliyun.com/product/dragonwell
GNU General Public License v2.0
552 stars 111 forks source link

[Bug][performance] Performance degradation with -XX:+UseCompactObjectHeaders on streams #809

Closed duanyangjing closed 3 months ago

duanyangjing commented 3 months ago

Description The following code shows 30% performance degradation with -XX:+UseCompactObjectHeaders

import java.util.Objects;
import java.util.function.*;
import java.util.concurrent.atomic.*;
import java.util.stream.*;

public class AllMatcher {
    private int size = 100000;

    private LongPredicate op;

    public void setup() {
        op  = new LongPredicate() {
            public boolean test(long v) {
                return true;
            }
        };
    }

    public boolean seq_filter_findAny() {
        return !(LongStream.range(0, size).filter(op.negate()).findAny().isPresent());
    }

    public static void main(String[] args) {
        AllMatcher fe = new AllMatcher();
        fe.setup();

        for (int i = 0; i < 100; i++) {
            fe.seq_filter_findAny();
        }

        long start = System.currentTimeMillis();
        for (int i=0; i< 10000; i++) {
            fe.seq_filter_findAny();
        }

        long end = System.currentTimeMillis();
        System.out.println("ms: "+ (end-start));
    }
}

Steps to Reproduce Steps to reproduce the behavior:

~/tools/dragonwell-11.0.22.19+7-ga/bin/javac AllMatcher.java
~/tools/dragonwell-11.0.22.19+7-ga/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders AllMatcher
ms: 2093 
~/tools/dragonwell-11.0.22.19+7-ga/bin/java -XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders AllMatcher
ms: 1485

Expected behavior When UseCompactObjectHeaders is enabled performance is at least on par with it turned off.

JDK version

openjdk 11.0.22.19 2024-01-16
OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition)-11.0.22.19+7-ga (build 11.0.22.19+7)
OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition)-11.0.22.19+7-ga (build 11.0.22.19+7, mixed mode)

Execution environment

mmyxym commented 3 months ago

AllMatcher.java:4: error: cannot find symbol private LongPredicate op; ^ symbol: class LongPredicate location: class AllMatcher AllMatcher.java:7: error: cannot find symbol

Could you please provide completed test case?

duanyangjing commented 3 months ago

Could you please provide completed test case?

Sorry, the import part is updated.

sendaoYan commented 3 months ago

x64 ecs可以稳定复现:

image

yitian ecs可以稳定复现:

image

mmyxym commented 3 months ago

Could you please provide completed test case?

Sorry, the import part is updated.

I have reproduced the issue and had the same result with lilliput-21u in lilliput project. The root cause is the heavy use of Klass* loading in test case. I would discuss with Lilliput project lead Roman on this issue. Thanks for the reporting!

mmyxym commented 3 months ago

Beside the test case, do you see any significant slowdown on real workload or applications? We haven't seen noticeable performance regression on real workloads.

duanyangjing commented 3 months ago

Could you please provide completed test case?

Sorry, the import part is updated.

I have reproduced the issue and had the same result with lilliput-21u in lilliput project. The root cause is the heavy use of Klass* loading in test case. I would discuss with Lilliput project lead Roman on this issue. Thanks for the reporting!

https://github.com/dragonwell-project/dragonwell11/blob/6c9ea3ea1504f5a5deccc38a89afdbb0faa560fa/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5650

Is this the "heavy use of Klass* loading" overhead you mentioned? I believe with lilliput several additional instructions will be generated for klass loading.

duanyangjing commented 3 months ago

Beside the test case, do you see any significant slowdown on real workload or applications? We haven't seen noticeable performance regression on real workloads.

Yes on some internal workload we have seen up to 10% degradation.

mmyxym commented 3 months ago

Could you please provide completed test case?

Sorry, the import part is updated.

I have reproduced the issue and had the same result with lilliput-21u in lilliput project. The root cause is the heavy use of Klass* loading in test case. I would discuss with Lilliput project lead Roman on this issue. Thanks for the reporting!

https://github.com/dragonwell-project/dragonwell11/blob/6c9ea3ea1504f5a5deccc38a89afdbb0faa560fa/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5650

Is this the "heavy use of Klass* loading" overhead you mentioned? I believe with lilliput several additional instructions will be generated for klass loading.

Yes, mainly from the monitor check: testb(dst, markOopDesc::monitor_value); jcc(Assembler::notZero, stub->entry());

mmyxym commented 3 months ago

The issue is now on progress with the official Lilliput project. Close this issue.