ballerina-platform / ballerina-lang

The Ballerina Programming Language
https://ballerina.io/
Apache License 2.0
3.67k stars 751 forks source link

[Bug]: `bal test --native` fails in windows without test report #38882

Open TharmiganK opened 1 year ago

TharmiganK commented 1 year ago

Description

In the native test workflows, the windows testing is failing with error: there are test failures without any test report.

Steps to Reproduce

See the following workflow runners :

Affected Version(s)

Ballerina SwanLake Update 4

OS, DB, other environment details and versions

Windows

Related area

-> Test Framework

Related issue(s) (optional)

No response

Suggested label(s) (optional)

No response

Suggested assignee(s) (optional)

No response

gayaldassanayake commented 1 year ago

The issue is due to a windows exception with the exit code EXCEPTION_ACCESS_VIOLATION (0xc0000005) during the runtime.

The most possible reason is the high memory usage which causes to exceed the available memory.

I am currently looking into,

The same exact issue is there in the grpc module as well.

TharmiganK commented 1 year ago

@gayaldassanayake is this happening only in GitHub workflow runner or are you able to reproduce this issue locally?

gayaldassanayake commented 1 year ago

@gayaldassanayake is this happening only in GitHub workflow runner or are you able to reproduce this issue locally?

Yes the issue is locally reproducible. However the root cause for the windows error code is yet unclear.

TharmiganK commented 1 year ago

@gayaldassanayake is this happening only in GitHub workflow runner or are you able to reproduce this issue locally?

Yes the issue is locally reproducible. However the root cause for the windows error code is yet unclear.

The issue might be because of a failing test case in the std lib module, which causes some unexpected error while running the tests. @gayaldassanayake did you able to figure out which test case causes this issue?

I reported this issue to lang since there is no test report. It might be a std lib issue as well.

TharmiganK commented 1 year ago

After fixing OOM in http and java.jdbc workflow runs, the above issue is noticed in those modules as well. Related workflow runs :

Thevakumar-Luheerathan commented 1 year ago

@gayaldassanayake is this happening only in GitHub workflow runner or are you able to reproduce this issue locally?

Yes the issue is locally reproducible. However the root cause for the windows error code is yet unclear.

The issue might be because of a failing test case in the std lib module, which causes some unexpected error while running the tests. @gayaldassanayake did you able to figure out which test case causes this issue?

I reported this issue to lang since there is no test report. It might be a std lib issue as well.

@TharmiganK The error might be in generated byte code or standard library side or even in Testerina side (possibly in passing native arguments to native-image, we checked them too, nothing found to be working). In the Testerina runtime, we execute methods within testable jar(which is generated from tests resources) through a java processBuilder, while executing a method ($moduleStart from $_init class) the process is terminated by windows. That's why we couldn't get any test reports. we only get the termination code to the main process (EXCEPTION_ACCESS_VIOLATION (0xc0000005)).

Thevakumar-Luheerathan commented 1 year ago

All of the above repos are failing due to EXCEPTION_ACCESS_VIOLATION (0xc0000005)

gayaldassanayake commented 1 year ago

As per the call in the GraalVM group today,

TharmiganK commented 1 year ago

After some recent changes in GraphQL, the windows native tests is also failing similarly in GraphQL : https://github.com/ballerina-platform/module-ballerina-graphql/actions/runs/4097934000/jobs/7066673197#step:11:433

Thevakumar-Luheerathan commented 1 year ago

GraphQL is also failing with the same exit code(EXCEPTION_ACCESS_VIOLATION (0xc0000005)). https://github.com/Thevakumar-Luheerathan/module-ballerina-graphql/actions/runs/4101524373/jobs/7073399976#step:11:434

The issue seems to be due to the recent changes in the GraphQL side. I checked the a previous commit (def4851d) with latest master. It passed without any issue.

MohamedSabthar commented 1 year ago

Hi @Thevakumar-Luheerathan, I was wondering if there is any progress on this issue? Have we been able to identify the root cause of the problem yet?

Thevakumar-Luheerathan commented 1 year ago

Hi @Thevakumar-Luheerathan, I was wondering if there is any progress on this issue? Have we been able to identify the root cause of the problem yet?

Hi @MohamedSabthar.. We still couldn't find the exact root cause as it is terminated within $moduleInit() method(Testerina relies on this generated byte-code method to execute the test functions).Runtime team is working on few improvements regarding non-reproducible loops. As per the discussions, we expect this issue will be resolved with those changes.

ThisaruGuruge commented 1 year ago

Any update on this @Thevakumar-Luheerathan ?

Thevakumar-Luheerathan commented 1 year ago

Any update on this @Thevakumar-Luheerathan ?

The fix for this issue for the GraphQL is available on https://github.com/ballerina-platform/ballerina-lang/tree/graalvm-non-reducible-loops branch. Further testing is going on. It will be merged to master Once fix is verified.

TharmiganK commented 1 year ago

I have tested the http test failure related to this issue with non-reducible-loop fix and observed the following:

  1. When we remove some test cases randomly this error is resolved and could not found a particular test case which causes this issue.
  2. When I change all the service declaration in the test module to service class/object definition and manually attach, start and register the listener, the bal test --native works without any issue.
  3. After changing the service declarations to service definition, I wrote a main function and called all the test functions(total 296) in the main. Then built and ran the native image. The native image failed at runtime after executing some of the tests.
  4. Then I tried to remove test cases from the main function and able to reproduce this issue with about 150 test functions. But at this time, the error is returned intermittently and it failed after executing different set of tests.
  5. Then I tried to run a single test function multiple times in the main function but in that case this error is not reproducible.

Link to the codes used for the above testing : https://drive.google.com/drive/folders/1_cD6q1eyWR-mRYM5jdSYfZEG5zZweYYc?usp=sharing

And had a call on these observations with @warunalakshitha and @shafreenAnfar and decided to check further on this:

And since this is even reproducible in a main function, most probably the issue is not related to testerina.

sahanHe commented 1 year ago

I'm facing the same issue in ubuntu for ballerinax/persist.googlesheets library for the PR #14 with the same error without generating a test report. The PR was meant to enable test cases with a test:BeforeSuite method to reset the test google spreadsheet. A std lib issue was also created to track this. Issue #4537

ThisaruGuruge commented 1 year ago

Is this merged now? Can we enable the Windows tests in the GraalVM tests?

SasinduDilshara commented 3 months ago

I faced the same issue in https://github.com/ballerina-platform/module-ballerina-data.csv/pull/3 Build logs - https://github.com/ballerina-platform/module-ballerina-data.csv/actions/runs/9870309268/job/27255766522?pr=3