adoptium / aqa-systemtest

Java load testing and other full system application tests
Apache License 2.0
19 stars 65 forks source link

TestJlmRemoteMemoryNoAuth intermittently fails with 'Peak Usage used memory smaller than Current Usage used memory' #274

Open Mesbah-Alam opened 5 years ago

Mesbah-Alam commented 5 years ago

TestJlmRemoteMemoryNoAuth test intermittently fails with Peak Usage used memory smaller than Current Usage used memory.

We should revisit the unstable test to check its validity.

Example:

https://ci.adoptopenjdk.net/view/Test_system/job/openjdk12_hs_systemtest_x86-64_windows/71/tapResults/

CL1 stderr Exception in thread "main" java.lang.AssertionError: Peak Usage used memory smaller than Current Usage used memory
CL1 stderr  at org.junit.Assert.fail(Assert.java:88)
CL1 stderr  at net.adoptopenjdk.test.jlm.resources.MemoryData.checkPeakAndCurrentMemoryUsage(MemoryData.java:521)
CL1 stderr  at net.adoptopenjdk.test.jlm.resources.MemoryData.writeData(MemoryData.java:182)
CL1 stderr  at net.adoptopenjdk.test.jlm.remote.MemoryProfiler.getStatsViaProxy(MemoryProfiler.java:183)
CL1 stderr  at net.adoptopenjdk.test.jlm.remote.MemoryProfiler.main(MemoryProfiler.java:110)
STF 11:07:08.791 - **FAILED** Process CL1 ended with exit code (1) and not the expected exit code/s (0)
Mesbah-Alam commented 5 years ago

5x Grinder passed 100%: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1660/console

Mesbah-Alam commented 5 years ago

Summary of what the failing test is doing:

1) Given a list of instantiated MemoryPoolMXBean objects, the test invokes the java.lang.management.MemoryPoolMXBean.getUsage() API to get the "current usage" value; and then it invokes java.lang.management.MemoryPoolMXBean.getPeakUsage() to get the "peak usage" value. 2) It then performs the following validation:

long peakUsageUsed = peak_usage.getUsed();
        long currentUsageUsed = current_usage.getUsed();

        if (peakUsageUsed < currentUsageUsed) {
            Message.logErr("  Peak usage smaller than current usage here:");
            Message.logErr("  Memory Pool:              " + poolName);
            Message.logErr("  Memory Type:              " + poolType);
            Message.logErr("  Peak Usage:               " + peak_usage.toString());
            Message.logErr("  Current Usage:            " + current_usage.toString());
            Assert.fail("Peak Usage used memory smaller than Current Usage used memory");
        }

So, intermittently, on various platforms, this validation is failing.

Question is - whether or not this failure indicates something wrong with what we are testing - i.e., the JLM API of the SDK under test.

Java doc for java.lang.management.MemoryPoolMXBean.getUsage()

Returns an estimate of the memory usage of this memory pool. This method returns null if this memory pool is not valid (i.e. no longer exists).

This method requests the Java virtual machine to make a best-effort estimate of the current memory usage of this memory pool. For some memory pools, this method may be an expensive operation that requires some computation to determine the estimate. An implementation should document when this is the case.

This method is designed for use in monitoring system memory usage and detecting low memory condition.

Javadoc for java.lang.management.MemoryPoolMXBean.getPeakUsage()

Returns the peak memory usage of this memory pool since the Java virtual machine was started or since the peak was reset. This method returns null if this memory pool is not valid (i.e. no longer exists).

FYI.. @lumpfish @JasonFengJ9

pshipton commented 5 years ago

@LinHu2016 can you please take a look. Any relation to https://github.com/eclipse/openj9/pull/5324?

LinHu2016 commented 5 years ago

@pshipton I will take a look the issue, Thanks

LinHu2016 commented 5 years ago

base on example (https://ci.adoptopenjdk.net/view/Test_system/job/openjdk12_hs_systemtest_x86-64_windows/71/tapResults/), the error is for OpenJDK hotspot G1, please provide the error case on OpenJ9, Thanks

CL1 stderr j> 2019/06/10 11:07:07.499 Peak usage smaller than current usage here: CL1 stderr j> 2019/06/10 11:07:07.500 Memory Pool: G1 Old Gen CL1 stderr j> 2019/06/10 11:07:07.501 Memory Type: Heap memory CL1 stderr j> 2019/06/10 11:07:07.501 Peak Usage: init = 118489088(115712K) used = 12219608(11933K) committed = 118489088(115712K) max = 268435456(262144K) CL1 stderr j> 2019/06/10 11:07:07.502 Current Usage: init = 118489088(115712K) used = 12241080(11954K) committed = 50331648(49152K) max = 268435456(262144K) CL1 stderr Exception in thread "main" java.lang.AssertionError: Peak Usage used memory smaller than Current Usage used memory CL1 stderr at org.junit.Assert.fail(Assert.java:88) CL1 stderr at net.adoptopenjdk.test.jlm.resources.MemoryData.checkPeakAndCurrentMemoryUsage(MemoryData.java:521) CL1 stderr at net.adoptopenjdk.test.jlm.resources.MemoryData.writeData(MemoryData.java:182) CL1 stderr at net.adoptopenjdk.test.jlm.remote.MemoryProfiler.getStatsViaProxy(MemoryProfiler.java:183) CL1 stderr at net.adoptopenjdk.test.jlm.remote.MemoryProfiler.main(MemoryProfiler.java:110)

Mesbah-Alam commented 5 years ago

@LinHu2016 : The error is intermittent. Started a Grinder for OpenJ9 here: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1664/console

LinHu2016 commented 5 years ago

Hi @Mesbah-Alam,Could you please share test code or point the test src code location? if it has the issue on both Hotspot and OpenJ9, sounds like issue is on JCL or testing code.

Mesbah-Alam commented 5 years ago

@LinHu2016 ,

The problem arises here: https://github.com/AdoptOpenJDK/openjdk-systemtest/blob/5d351a3f33c6e792c646b7a08539778c8a56caac/openjdk.test.jlm/src/test.jlm/net/adoptopenjdk/test/jlm/resources/MemoryData.java#L521

The actual test class is https://github.com/AdoptOpenJDK/openjdk-systemtest/blob/master/openjdk.test.jlm/src/test.jlm/net/adoptopenjdk/test/jlm/TestJlmRemoteMemoryNoAuth.java

LinHu2016 commented 5 years ago

another thought it might relate nonheap memory pool(such as codecache?), wait the grinder to catch the issue on OpenJ9 for investigating the details.

LinHu2016 commented 5 years ago

@LinHu2016 : The error is intermittent. Started a Grinder for OpenJ9 here: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1664/console

@Mesbah-Alam the Grinder failed to run the tests.

Mesbah-Alam commented 5 years ago

Likely something is wrong with the Windows build at Adopt. The failure above happened due to AUTO_DETECT mechanism being turned on but the java -version output not containing "Compressed references" string. I launched another Grinder with AUTO_DETECT off, using the published build at Adopt, that failed too: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1675/tapResults/

lumpfish commented 3 years ago

Still occurring from time to time: https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_sanity.system_aarch64_linux/400

lumpfish commented 3 years ago

This is still occurring in the test builds. @LinHu2016 - are you able to provide any insight into when the values returned by the API calls

java.lang.management.MemoryPoolMXBean.getUsage()
java.lang.management.MemoryPoolMXBean.getPeakUsage()

are updated by the JVM?

I'm looking for a possible explanation for the intermittent test case behavior - for instance, is the data returned by getUsage() always 'live' data but the getPeakUsage() data only updated periodically?

LinHu2016 commented 3 years ago

Hi @lumpfish Could you please provide a failure case? I would like investigate the case. In general for heap memory pools, the peakUsage would be updated at the begin of GC and at the end of GC or when getUsage(), getPeakUsage() and resetPeakUsage() are called by user, so usage should never be larger than peakUsage. previously there was a bug for retrieving usage for the non heap memory pool, I would like confirm if the failure is related with the fix. Thanks

LinHu2016 commented 3 years ago

also getUsage(), getPeakUsage() are two independent api, heap usage could be changed between api calls, so for realtime you should call getUsage before calling getPeakUsage if you want to compare between the two sizes.

LinHu2016 commented 3 years ago

so above the validation code should be like this

    long currentUsageUsed = current_usage.getUsed();
        long peakUsageUsed = peak_usage.getUsed();

        if (peakUsageUsed < currentUsageUsed) {
            Message.logErr("  Peak usage smaller than current usage here:");
            Message.logErr("  Memory Pool:              " + poolName);
            Message.logErr("  Memory Type:              " + poolType);
            Message.logErr("  Peak Usage:               " + peak_usage.toString());
            Message.logErr("  Current Usage:            " + current_usage.toString());
            Assert.fail("Peak Usage used memory smaller than Current Usage used memory");
        }
lumpfish commented 3 years ago

@LinHu2016 - thank you for the insight about why current usage might be reported as higher than peak usage, I'll try reversing the API calls as you suggest and see if it makes a difference.

I haven't managed to recreate this failure in a smaller, simpler test case, nor reproduce it locally - I have only seen the failure on Jenkins slave machines.

If you want to run the test locally:

  1. Build the openjdk-systemtest and stf repositories. Either build via testKitGen (https://github.com/AdoptOpenJDK/openjdk-tests/blob/master/doc/userGuide.md) or follow https://github.com/adoptopenjdk/openjdk-systemtest

  2. Run the test.

    For Windows:

    set JAVA_HOME=java_to_test
    set PATH=java_to_test\bin;%PATH%
    set SYSTEMTEST_HOME=git_checkout_location - e.g. c:\Users\user\git
    set RESULTS_HOME=somewhere_to_put_results - e.g. c:\Users\user\stf_results
    perl %SYSTEMTEST_HOME%/stf/stf.core/scripts/stf.pl -test-root="%SYSTEMTEST_HOME%/stf;%SYSTEMTEST_HOME%/openjdk-systemtest" -systemtest-prereqs="%SYSTEMTEST_HOME%/../systemtest_prereqs" -results-root="%RESULTS_HOME%/TestJlmRemoteMemoryNoAuth" -test=TestJlmRemoteMemoryNoAuth

For Unix

export JAVA_HOME=java_to_test
export PATH=java_to_test/bin:$PATH
export SYSTEMTEST_HOME=git_checkout_location - e.g. /home/user/git
export RESULTS_HOME=somewhere_to_put_results - e.g. /home/user/stf_results
perl $SYSTEMTEST_HOME/stf/stf.core/scripts/stf.pl -test-root="$SYSTEMTEST_HOME/stf;$SYSTEMTEST_HOME/openjdk-systemtest" -systemtest-prereqs="$SYSTEMTEST_HOME/../systemtest_prereqs" -results-root="$RESULTS_HOME/TestJlmRemoteMemoryNoAuth" -test=LambdaLoadTest

The test creates a new directory under RESULTS_HOME. In there you will see an execute.pl file which is run during the test but can be rerun manually. So the commands issued in the test can be tweaked (e.g. to add extra arguments to java command lines) and the test rerun.

Each time the test is run via stf.pl a new directory is created with a newly generated execute.pl etc. files.

The java test case code is in the openjdk.test.jlm directory in the openjdk-systemtest repository (under SYSTEMTEST_HOME). If you make want to try changes to those files, edit, rebuild and rerun.

LinHu2016 commented 3 years ago

so above the validation code should be like this

  long currentUsageUsed = current_usage.getUsed();
        long peakUsageUsed = peak_usage.getUsed();

      if (peakUsageUsed < currentUsageUsed) {
          Message.logErr("  Peak usage smaller than current usage here:");
          Message.logErr("  Memory Pool:              " + poolName);
          Message.logErr("  Memory Type:              " + poolType);
          Message.logErr("  Peak Usage:               " + peak_usage.toString());
          Message.logErr("  Current Usage:            " + current_usage.toString());
          Assert.fail("Peak Usage used memory smaller than Current Usage used memory");
      }

I mean call getUsagge() before calling getPeakUsage(), not current_usage.getUsed() before peak_usage.getUsed(), so for the confusion.

lumpfish commented 3 years ago

After more investigation (using hotspot jdk8 mac, which shows this failure more often than other platforms (about 1 in 10 test runs)), is seems that peak_usage is not updated often enough to guarantee that it will always be equal to or greater than the maximum value that current usage has ever had.

Adding some additional code to the test case demonstrates this:

  Memory Pool:              Code Cache
  Memory Type:              Non-heap memory
  Part of Memory System:    Yes
174: Getting current usage:
  Current Memory Usage:     init = 2555904(2496K) used = 26777216(26149K) committed = 27262976(26624K) max = 251658240(245760K)
178: Getting peak usage:
  Peak Memory Usage:        init = 2555904(2496K) used = 26774912(26147K) committed = 27262976(26624K) max = 251658240(245760K)
185: Checking peak usage:
Trace:  peakUsageUsed = 26774912, currentUsageUsed = 26777216
Trace:  peakUsageUsed = 26774912, currentUsageUsed = 26777216
Trace:  peakUsageUsed less than currentUsageUsed, getting values again
Trace:  new values peakUsageUsed_2 = 26774912, currentUsageUsed_2 = 26772096

The test case was modified such that if the returned peak usage value was less than the returned current usage value the values were retrieved once more. Upon retrieving the values a second time, the peak usage value was unchanged, but the current usage value has fallen back below the peak usage value.

The openjdk test case https://github.com/openjdk/jdk/blob/05a764f4ffb8030d6b768f2d362c388e5aabd92d/test/hotspot/jtreg/compiler/codecache/jmx/PeakUsageTest.java#L77 performs a similar test, suggesting that the behavior observed here is not expected.

caozg2021 commented 3 years ago

current_usage.getUsed() should be called before peak_usage.getUsed() ? when will this issue will be fixed ?

lumpfish commented 3 years ago

current_usage.getUsed() should be called before peak_usage.getUsed() ? when will this issue will be fixed ?

Please see the additional comment in https://github.com/AdoptOpenJDK/openjdk-systemtest/issues/274#issuecomment-760313329

I mean call getUsagge() before calling getPeakUsage(), not current_usage.getUsed() before peak_usage.getUsed(), so for the confusion.

I am unable to see any error in in the test case code and have not been able to identify the presumed timing window in the openjdk code - see comment https://github.com/AdoptOpenJDK/openjdk-systemtest/issues/274#issuecomment-772473352.

I have not seen this test fail in an openj9 jdk for a long time, only hotspot jdks.

@caozg2021 - I assume from your interest you are also seeing this issue. Is that from running this test case or are you detecting it in a production system?

caozg2021 commented 3 years ago

@lumpfish i test my compiled jdk (hotspot) running this test case using openjdk-systemtest,not a production system.

lumpfish commented 3 years ago

What jdk version / platform did you build? I haven't been able to reproduce this on my local laptop - I think it might be easier to debug if I could.

caozg2021 commented 3 years ago

@lumpfish JDK is corretto 8.275.01.1 Linux x64 (https://github.com/corretto/corretto-8/releases), I tested in docker (CentOS Linux release 7.8.2003 (Core)) My test script is

a=0
int=0
while (( $int <= 0))
do
  perl $SYSTEMTEST_HOME/stf/stf.core/scripts/stf.pl -test-root="$SYSTEMTEST_HOME/stf;$SYSTEMTEST_HOME/openjdk-systemtest" -systemtest-prereqs="$SYSTEMTEST_HOME/../systemtest_prereqs"  -test=TestJlmRemoteMemoryNoAuth
  if [ $? -ne 0 ]
  then
   let "int=1"
  fi
  let "a=a+1"
  echo "loop count " $a
done

I run it to reproduce this issue about some hours.

caozg2021 commented 3 years ago

I add some debug info in CodeHeapPool::used_in_bytes to print _next_segment and _freelist_segments

https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/9dbeec9bf0ab6ab16ce60e51bbf978774d929e74/hotspot/src/share/vm/services/memoryPool.hpp#L222

debug info like this

tty->print("Code Cache printInfo %d, %d\n", _next_segment,  _freelist_segments);

and print the CodeBlob's name at CodeCache::free

https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/9dbeec9bf0ab6ab16ce60e51bbf978774d929e74/hotspot/src/share/vm/code/codeCache.cpp#L214

and open log in NMethodSweeper::sweep_code_cache

https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/9dbeec9bf0ab6ab16ce60e51bbf978774d929e74/hotspot/src/share/vm/runtime/sweeper.cpp#L347

then reproduce this issue ,the debug is below

Code Cache printInfo 358516, 1984
Code Cache printInfo 358516, 1984
Code Cache printInfo 358516, 1984
### Sweep at 0 out of 4252. Invocations left: 16
Code Cache printInfo 358557, 1984
UsageMemory Code Cache init 2555904, used 22820672,  commit 23003136, max 251658240
### Sweep at 265 out of 4253. Invocations left: 15
CodeCache::free nmethod
### Sweep at 530 out of 4253. Invocations left: 14
CodeCache::free nmethod
### Sweep at 795 out of 4253. Invocations left: 13
CodeCache::free nmethod
Code Cache printInfo 358598, 2060
PeakMemory Code Cache init 2555904, used 22818432,  commit 23003136, max 251658240
Code Cache printInfo 358598, 2060
Code Cache printInfo 358598, 2060
Code Cache printInfo 358598, 2060

I think this issue is caused by CodeCache sweep, First get the CodeCache pool 's UsageMemory _next_segment is 358557 _freelist_segments is 1984, but now the MemoryPool:_peak_usage not update ,then the sweeper sweep the CodeCache , the _freelist_segments increase , after that , get the CodeCache pool's PeakMemory, _next_segment is 358598 _freelist_segments is 2060 . so the CodeCache's peak memory usage is less than the current memory usage.

lumpfish commented 3 years ago

I had been investigating along these lines:

  1. Peak usage is updated in record_peak_memory_usage(), which gets the current memory usage and updates peak usage if it is now smaller than current usage
  2. record_peak_memory_usage() looks to be called at the start of each gc cycle, and whenever getPeakUsage is called via java.lang.management
  3. getUsage() (i.e. get current usage) just returns the value from get_memory_usage().
  4. So I guessed there is a timing window where in between two calls to record_peak_memory_usage a call to get_memory_usage might return a value greater than the last recorded peak_usage, but which was never recorded as a 'peak'.

Running locally

  1. I've never seen this issue running with a 1 CPU virtual machine
  2. I've run a test where peak usage is updated in get_memory_usage() whenever the current memory is greater than the previous peak usage, and have not seen the failure then. Such a 'fix' would have a performance hit.

I never thought to look at Code Cache sweep though.

caozg2021 commented 3 years ago

the CodeCache usage is related to _next_segment - _freelist_segments

https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/9dbeec9bf0ab6ab16ce60e51bbf978774d929e74/hotspot/src/share/vm/memory/heap.cpp#L317

In my debug log , _next_segment alaways increase, but _freelist_segments increase or decrease dynamically . @lumpfish did you modified all the memorypool‘s get_memory_usage , and how to test this modify would have a performance hit?

lumpfish commented 3 years ago

In my experiment I only modified MemoryUsage CodeHeapPool::get_memory_usage() - that is the only memory pool I've seen the error in - but I can't see a logical reason why that would be the only memory pool affected. Perhaps it is simply more dynamic in its behavior?

I added a call:

update_peak_memory_usage(initial_size(), used, committed, maxSize);

before MemoryUsage CodeHeapPool::get_memory_usage() returns its MemoryUsage object, and created

MemoryPool::update_peak_memory_usage(size_t initial_size, size_t used, size_t committed, size_t max_size)

with the same logic as record_peak_memory_usage() but without the call to get_memory_usage(), using the passed in parameters instead.

It appears to stop the problem occurring - but now peak_usage will be updated more frequently. As a rough performance assessment I was going to add counters for the number of times peak_usage is updated with and without the fix. I haven't got round to it yet - busy on other things at the moment.

I tried reproducing the issue using the openjdk head stream (openjdk/jdk repository), but could not. Have you seen it on a release other than jdk8?

caozg2021 commented 3 years ago

My working JDK is jdk8,I only run the systemtest on jdk8.

lumpfish commented 3 years ago

I've created a branch with my attempt at a fix: https://github.com/AdoptOpenJDK/openjdk-jdk8u/compare/master...lumpfish:fix_peak_memory_usage

The change currently includes code to turn the behaviour on and off and print some diagnostics using environment variables:

  1. If the environment variable PRINT_PEAK_USAGE_UPDATE_COUNT is set, the number of times _peak_usage has been updated (precision of 100 updates) is printed.
  2. If the environment variable UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE is set _peak_usage is updated in get_memory_usage rather than in record_peak_memory_usage
    • This results in _peak_usage being updated more frequently, but closes the window for the result of calls to getUsage() exceeding the result of calls to getPeakUsage().

Running the test with and without UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE set shows that with the change _peak_usage is updated approximately 1.5 to 2.5 times more frequently with the change depending on the memory pool. The test runs for approximately 5 minutes.

Output at end of the test without the change (i.e. without enviroment variable UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE set):

_peak_usage_update_count for memoryPool Metaspace now 13700
_peak_usage_update_count for memoryPool Code Cache now 20600
_peak_usage_update_count for memoryPool PS Eden Space now 12800
_peak_usage_update_count for memoryPool PS Survivor Space now 12800
_peak_usage_update_count for memoryPool PS Old Gen now 12800
_peak_usage_update_count for memoryPool Compressed Class Space now 12900

Output at end of the test with the change (i.e. with enviroment variable UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE set):

_peak_usage_update_count for memoryPool PS Eden Space now 32800
_peak_usage_update_count for memoryPool PS Survivor Space now 32800
_peak_usage_update_count for memoryPool PS Old Gen now 26400
_peak_usage_update_count for memoryPool Compressed Class Space now 26200
_peak_usage_update_count for memoryPool Code Cache now 34100
_peak_usage_update_count for memoryPool Metaspace now 27100
lumpfish commented 3 years ago

I have created a standalone client which can attach to any java process (started with the required command line options) and monitor the peak usage in that process.

Using this client the issue can also be seen when running the java db-shootout benchmark from the renaissance benchmarks project https://github.com/renaissance-benchmarks/renaissance. (Presumably other benchmarks would also demonstrate the issue.)

When running the benchmark for 5 minutes, peak usage being lower than current usage is reported about one in eight runs.

This is the client test case: PeakUsageRemoteTest.java.txt. (The .txt extension is required for github to accept the file. Remove the .txt extension (leaving the .java extension) after downloading.

To allow the client to be attached to a running process, that process must have the following options added to its command line:

-Dcom.sun.management.jmxremote.port=1234 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

To reproduce using the db-shootout benchmark:

  1. The test needs a machine with multiple CPUs. I have reproduced the failure on an Virtual Box Ubuntu guest with 4 cpus.
  2. Run using hotspot jdk8 (this issue has not (yet?) been seen on jdk11+).
  3. Compile the test case e.g. to /home/user/PeakUsageRemoteTest.class.
  4. Download the benchmark from https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.11.0/renaissance-gpl-0.11.0.jar e.g. to /home/user/Downloads
  5. Start the benchmark with the command:
    java -Dcom.sun.management.jmxremote.port=1234 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -jar /home/user/Downloads/renaissance-gpl-0.11.0.jar db-shootout -t 300
  6. From another terminal, run the test case
    cd /home/user && java -cp . PeakUsageRemoteTest

    The test case monitors the peak usage every second 240 times (so should complete before the benchmark). If the benchmark ends while the client test case is still running the client ends abnormally with various exceptions (ConnectException, SocketException or others).

If at any point peak usage is lower than current usage the test case outputs a message:

Code Cache : *** Peak usage (8827328) smaller than current usage (8834368) ***

showing the name of the memory pool with the error (Code Cache), and the usage values returned. I've only ever seen the error reported for 'Code Cache'.

Since the error occurs infrequently this shell script will run both the benchmark and the client repeatedly and should ensure that the benchmark never ends before the client test case.

a=0
rc=0
client_secs=240
benchmark_secs=300
while [ $rc -eq 0 ]
do
  a=$((a + 1))
  echo "Iteration" $a
  # Run benchmark in background
  java -Dcom.sun.management.jmxremote.port=1234 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -jar /home/user/Downloads/renaissance-gpl-0.11.0.jar db-shootout -t $benchmark_secs >/dev/null 2>&1 &
  bpid=$!
  # Run client in foreground for less time than the benchmark
  java -cp . PeakUsageRemoteTest
  # If the test failed stop the loop
  if [ $? -ne 0 ]
  then
    rc=1
  fi
  # wait for the benchmark to finish
  echo "PeakUsageRemoteTest finished, waiting for background process $bpid to finish"
  wait $bpid
  echo "Background process $bpid finished"
  echo "Iteration" $a "finished"
done
lumpfish commented 3 years ago

@andrew-m-leonard - FYI, https://github.com/adoptium/aqa-systemtest/issues/274#issuecomment-806646077

lumpfish commented 3 years ago

I've now seen this on a jdk11 hotspot aarch64_linux nightly build for a different memory pool. So not unique to jdk8 or to the 'Code Cache' memory pool. Failing job link: https://ci.adoptopenjdk.net/job/Test_openjdk11_hs_sanity.system_aarch64_linux/477/consoleFull

20:42:40  CL2 j> 2021/03/30 19:42:39.276 Starting to write data
20:43:04  CL2 stderr j> 2021/03/30 19:43:02.041   Peak usage smaller than current usage here:
20:43:04  CL2 stderr j> 2021/03/30 19:43:02.042   Memory Pool:              G1 Old Gen
20:43:04  CL2 stderr j> 2021/03/30 19:43:02.045   Memory Type:              HEAP
20:43:04  CL2 stderr j> 2021/03/30 19:43:02.045   Peak Usage:               init = 106954752(104448K) used = 82918016(80974K) committed = 119537664(116736K) max = 268435456(262144K)
20:43:04  CL2 stderr j> 2021/03/30 19:43:02.046   Current Usage:            init = 106954752(104448K) used = 83638704(81678K) committed = 119537664(116736K) max = 268435456(262144K)
20:43:04  CL2 stderr Exception in thread "main" java.lang.AssertionError: Peak Usage used memory smaller than Current Usage used memory
sophia-guo commented 2 years ago

The test fails randomly with jdk8 at a high rate and according to the test history reports it's not platform nor machine related. While the test validity is still under investigation I will exclude it for jdk8 temurin.

sophia-guo commented 2 years ago

also happened with TestJlmRemoteMemoryAuth on alpine-linux

jiekang commented 11 months ago

The similar JDK test was failing and excluded temporarily, but fixed here: https://github.com/openjdk/jdk/commit/c0cfad4f3d40852f8b9d1944c07727ddd3278265

The peak usage is acquired before current usage which is done in TestJlmRemoteMemoryAuth, however they also have calls to bean.resetPeakUsage() before acquiring peak usage. TestJlmRemoteMemoryAuth does not seem to call that. Maybe doing so will bypass the issue where peak usage isn't up-to-date because it will force peak usage to be up-to-date.