Test 772 fails on Windows

OCamlPro / gnucobol

A clone of the sourceforge GnuCOBOL compiler from COBOL to C.

https://get-superbol.com

GNU Lesser General Public License v3.0

16 stars 20 forks source link

Test 772 fails on Windows #125

Open lefessan opened 9 months ago

lefessan commented 9 months ago

From https://github.com/OCamlPro/gnucobol/actions/runs/6675975179?pr=109

772. run_misc.at:4507: testing direct CALL in from C w/wo error; no exit ...
../../tests/run_misc.at:4618: $COMPILE caller.c
../../tests/run_misc.at:4619: $COMPILE_MODULE callee.cob callee2.cob buggy.cob
../../tests/run_misc.at:4620: $COBCRUN_DIRECT ./caller callee 00
../../tests/run_misc.at:4623: $COBCRUN_DIRECT ./caller callee 42
../../tests/run_misc.at:4626: $COBCRUN_DIRECT ./caller callee2
--- -   2023-10-28 09:13:49.613224400 +0000
+++ /d/a/gnucobol/gnucobol/_build/tests/testsuite.dir/at-groups/772/stderr      2023-10-28 09:13:49.548724300 +0000
@@ -1,2 +1 @@
-note: STOP RUN with return code 2

--- -   2023-10-28 09:13:49.690763800 +0000
+++ /d/a/gnucobol/gnucobol/_build/tests/testsuite.dir/at-groups/772/stdout      2023-10-28 09:13:49.642475500 +0000
@@ -1 +1 @@
-STOP WITH 2
+
../../tests/run_misc.at:4626: exit code was 127, expected 0
772. run_misc.at:4507: 772. direct CALL in from C w/wo error; no exit (run_misc.at:4507): FAILED (run_misc.at:4626)

ddeclerck commented 9 months ago

I stumbled on that one in #116 . Copy-pasting my analysis.

The failure is random, sometimes the test is OK, sometimes it fails. The problem occurs specifically with STOP RUN (replacing by an EXIT program gives no error). Also, it does NOT occur if we define COB_WITHOUT_JMP. Investigating a bit, it seems this is because the COBOL module executing the STOP RUN statement is unloaded (with lt_dlclose) before cob_stop_run has finished its execution. This might be okay when calling exit, but when using longjmp, this probably messes up the stack frame.

GitMensch commented 9 months ago

Instead of compiling with COB_WITHOUT_JMP it likely would also work to just disable the dlclose() by using COB_PHYSICAL_CANCEL=never $COBCRUN_DIRECT ./caller callee2 (inline, only for that run), right? What may works is COB_PRE_LOAD=callee2 $COBCRUN_DIRECT ./caller callee2

If this works we could do that in the testsuite and document that this may be necessary on some environments (known: Windows) or change the function cob_call_with_exception_check() (but then likely cob_call(), too) to set a flag "called by API" and always skip the complete module unloading part.

@ddeclerck Could you have a look at this, please?

ddeclerck commented 9 months ago

@ddeclerck Could you have a look at this, please?

Sure (as soon as I find a moment).

ddeclerck commented 7 months ago

Finally had time to have a look at this one. So, using COB_PHYSICAL_CANCEL=never does indeed prevent the bug from occurring. However COB_PRE_LOAD=callee2 does not help.

Now, what should we do ? Just add the workaround in the testsuite and document cob_call_with_exception_check as unsuitable for Windows ? Or as you suggest implement a flag to prevent unloading when calling from cob_call_with_exception_check ?

GitMensch commented 7 months ago

I'd prefer the second - and document that modules will only be unloaded with this function if after the call a manual call to cob_tidy() is done.