Karm / mandrel-integration-tests

Integration tests for GraalVM and its Mandrel distribution. Runs Quarkus, Helidon and Micronaut applications and small targeted reproducers. The focus is solely on native-image utility and compilation of Java applications into native executables.
Apache License 2.0
5 stars 3 forks source link

aarch64 VMs: Slower containers, GDB timeouts, Postgress 10 not for v8 aarch64 #177

Closed Karm closed 10 months ago

Karm commented 11 months ago

e.g.

FAIL_ON_PERF_REGRESSION=false mvn clean verify -Ptestsuite-builder-image -Dtest=DebugSymbolsTest#debugSymbolsQuarkusContainer -Dquarkus.native.builder-image=quay.io/quarkus/ubi-quarkus-mandrel-builder-image:jdk-17 -Dquarkus.version=3.2.3.Final

Also a follow up to #175

Karm commented 11 months ago

@jerboaa

1) gdb numbers across our various tests systems: WIP

2) Amending the timeouts: The latest commit takes the GOTO URL for hitting web app url during gdb session for a different approach using timeout, not trials. I also removed the FAIL_ON_PERF_REGRESSION check. I.e. less magic I hope.

Karm commented 11 months ago

Unexpected GOTO URL http://localhost:8080/data/config/lookup failure in 101ms, see: java.net.ConnectException: Connection refused (Connection refused),

O.K., I was too harsh.

Karm commented 11 months ago

The #171 has an incorrect exclusion matrix, e.g.

Error: [ERROR]  [error]: Build step io.quarkus.deployment.pkg.steps.NativeImageBuildStep#build threw an exception: java.lang.IllegalStateException: Out of date version of GraalVM detected: native-image 17.0.8 2023-07-18. Quarkus currently supports 22.3.0. Please upgrade GraalVM to this version.

trying to run Q 2.16 with M 23 on Windows.

jerboaa commented 11 months ago

trying to run Q 2.16 with M 23 on Windows.

OK. Let's fix this first and then we have a better idea about the state of this PR. Sound good?

Karm commented 11 months ago

/me examining the failures

Karm commented 11 months ago

Hmm, looks like we are also hit by something quite unrelated here;

 Error:    RuntimesSmokeTest.quarkusEncodingIssues:212->testRuntime:166 build-and-run.log log should not contain error or warning lines that are not whitelisted. See /home/runner/work/mandrel-integration-tests/mandrel-integration-tests/ts/testsuite/target/archived-logs/org.graalvm.tests.integration.RuntimesSmokeTest/quarkusEncodingIssues/build-and-run.log and check these offending lines: 
Warning:  Checksum validation failed, expected 2811ba27a71a8bda0602161ffe2f6e1429da8068 but is 36257165a0945753efb3f9d473d86c6f4c6c6f6e from google-maven-central for https://maven-central.storage-download.googleapis.com/maven2/org/jboss/arquillian/arquillian-bom/1.7.0.Final/arquillian-bom-1.7.0.Final.pom
Warning:  Could not validate integrity of download from https://maven-central.storage-download.googleapis.com/maven2/org/jboss/arquillian/arquillian-bom/1.7.0.Final/arquillian-bom-1.7.0.Final.pom

I'll bump Q version...

Karm commented 11 months ago

This:

    strategy:
      fail-fast: false
      matrix:
        os: [ ubuntu-20.04 ]
        include:
          - quarkus-version: '2.13.8.Final'
            mandrel-builder-image: '22.3-java17'
          - quarkus-version: '2.16.8.Final'
            mandrel-builder-image: '22.3-java17'
          - quarkus-version: '3.2.2.Final'
            mandrel-builder-image: 'jdk-17'

Makes only the last line run. I will revert to the previous older matrix format.

Karm commented 11 months ago

I am apparently a very poor yaml programmer.
Redoing the matrix one more time.

Karm commented 11 months ago

Fighting GHA aside, I can confirm that the 23.0 passes GDB tests also on slow aarch64 VMs now with this branch.

jerboaa commented 11 months ago

Some CI failures don't seem to cope well with extra output. Example from here:

There were errors in the GDB session. Note that commands in the session might depend on each other. Errors: 
Command 'b ConfigTestController.java:33' did not match the expected pattern '.*Breakpoint 1 at .*: file com/example/quarkus/config/ConfigTestController.java, line 33.*'.
Output was:
"b ConfigTestController.java:33
", 
Command 'run&' did not match the expected pattern '.*Installed features:.*'.
Output was:
Breakpoint 1 at 0x40e766: file com/example/quarkus/config/ConfigTestController.java, line 33.
breakpoint-created,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040e766",func="com.example.quarkus.config.ConfigTestController::getLookupConfigValue()",file="com/example/quarkus/config/ConfigTestController.java",fullname="/home/runner/work/mandrel-integration-tests/mandrel-integration-tests/ts/apps/quarkus-full-microprofile/target/quarkus-native-image-source-jar/sources/com/example/quarkus/config/ConfigTestController.java",line="33",thread-groups=["i1"],times="0",original-location="ConfigTestController.java:33"}^done(gdb) "run&
"Starting program: /home/runner/work/mandrel-integration-tests/mandrel-integration-tests/ts/apps/quarkus-full-microprofile/target/quarkus-runner 
thread-group-started,id="i1",pid="16284"thread-created,id="1",group-id="i1", 
Unexpected GOTO URL http://localhost:8080/data/config/lookup failure in 60034ms, see: java.net.ConnectException: Connection refused ==> expected: <true> but was: <false>
    at org.graalvm.tests.integration.DebugSymbolsTest.carryOutGDBSession(DebugSymbolsTest.java:452)
    at org.graalvm.tests.integration.DebugSymbolsTest.debugSymbolsQuarkus(DebugSymbolsTest.java:263)

Line is there:

Breakpoint 1 at 0x40e766: file com/example/quarkus/config/ConfigTestController.java, line 33.

but has extra output...

[Edit]: Or maybe not. Seems a timing issue...

Karm commented 11 months ago

@jerboaa It is timing on GHA.

$ export JAVA_HOME=/home/karm/workspaceRH/mandrel-release/mandrel-java20-23.0.1.2-Final;export GRAALVM_HOME=${JAVA_HOME};export PATH=${JAVA_HOME}/bin:${PATH}
$ mvn clean verify -Ptestsuite -DincludeTags=reproducers,perfcheck,runtimes -Dquarkus.version=3.2.3.Final
<SNIP>
[INFO] Native image integration TS ........................ SUCCESS [  0.105 s]
[INFO] testsuite .......................................... SUCCESS [27:34 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  27:35 min
[INFO] Finished at: 2023-08-11T22:07:56+02:00
[INFO] ------------------------------------------------------------------------

The question is how to make it more resilient to that. Apparently the crude timeout per gdb command ain't enough. The text appears in the StringBuffer but is never checked again...?

Karm commented 11 months ago

Re-running those two failed...

Karm commented 10 months ago

I am not giving up : )

When I use my local Ubuntu VM and set it to 2 vCPU and 8G RAM, I can reproduce the issue:

INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   DebugSymbolsTest.debugSymbolsQuarkus:263->carryOutGDBSession:452 There were errors in the GDB session. Note that commands in the session might depend on each other. Errors: 
Command 'run&' did not match the expected pattern '.*Installed features:.*'.
Output was:
"run&
"Starting program: /home/karm/mandrel-integration-tests/apps/quarkus-full-microprofile/target/quarkus-runner 
thread-group-started,id="i1",pid="6539"thread-created,id="1",group-id="i1"library-loaded,id="/lib64/ld-linux-x86-64.so.2",target-name="/lib64/ld-linux-x86-64.so.2",host-name="/lib64/ld-linux-x86-64.so.2",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7fc5090",to="0x00007ffff7fee335"}]^running*running,thread-id="all"(gdb) library-loaded,id="/lib/x86_64-linux-gnu/libz.so.1",target-name="/lib/x86_64-linux-gnu/libz.so.1",host-name="/lib/x86_64-linux-gnu/libz.so.1",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7f92280",to="0x00007ffff7fa2c14"}]library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7c28700",to="0x00007ffff7dbaabd"}][Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
 ==> expected: <true> but was: <false>
[INFO] 
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

When I run it locally on my CentOS 8 Stream workstation (12 cores, 32G RAM), I cannot reproduce it any more.

So I will keep poking at it until it's reliable.

Karm commented 10 months ago

I think the problem is that run& command immediately returns and that different gdb versions decide differently when to spit output, see the key difference beginning line 84:

https://editor.mergely.com/u0uC749D

On my fast CentOS system on the right, the output fed immediately after the run& command contains the expected string about Installed features, marking Quarkus is started. On the left panel, the somewhat feeble 2 vCPU Ubuntu VM, with a much different gdb version, does not spit any such output until a next command, bt is called. Hence the parsing fails.

I will try to jam some no-op command in the session to see if we could make it spit all the output sooner.

Karm commented 10 months ago

As I play with it, I'd put out there an idea that intermittently, clearly due to some race condition, debugged Quarkus app does not get into a correct state where it would be able to serve webpages at all, i.e. it keeps turning Connection Refused even minutes after 'run&' command executed. Sometimes.

Karm commented 10 months ago

@zakkak I don't know...

Unexpected GOTO URL http://localhost:8080/data/config/lookup failure in 300014ms, see: java.net.SocketTimeoutException: Connect timed out
zakkak commented 10 months ago

Looking at the logs I see that gdb starts fine in 18505ms. Then spents 500ms to show the version, Then 54012ms to set the breakpoint and another 500ms to run the application and interestingly it stops at the breakpoint, which is an indication that the application and gdb behave as expected !?

Karm commented 10 months ago

@zakkak Is it a Firefox glitch or are all check finally green? :-)

Let me know if you are O.K. with the solution. I have a strong feeling it could be drastically simplified. On the other hand, I am happy I finally know what's going on...