NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
44 stars 19 forks source link

replace test_OUT_[1-7].F with outtest[1-7].F90 #326

Closed jbathegit closed 1 year ago

jbathegit commented 1 year ago

Part of #308

Part of #33

jbathegit commented 1 year ago

Note that I used the library function copybf to replace all of the preAPX script logic that was a special one-off associated with OUT_2. So as a bonus we now have testing of an additional library function that we hadn't been testing yet ;-)

jbathegit commented 1 year ago

Since this PR was still waiting approval, I went ahead and finished converting all of the rest of the OUT tests ;-)

jbathegit commented 1 year ago

Hmm, the MacOS is failing on the outtest5_8 (8-byte build of outtest5) test. The STOP 1 means the generated output file out5.txt didn't exactly match (via cmp -s) the baseline OUT_5 result. This is the only code which generates an ASCII file instead of a BUFR file, though I can't imagine why that would be an issue, and why only for the _8 test (the _4 and _d tests for outtest5 are fine)?

At any rate, I don't have access to a MacOS to try to troubleshoot this myself offline. @edwardhartnett could you please help me out here, or do you know someone else who has a Mac that might be able to help? Or could you suggest another way that I might be able to troubleshoot this myself? Thanks for any assistance you can provide.

edwardhartnett commented 1 year ago

@jbathegit I also don't have a MacOS to test on. ;-) The GitHub actions is your best bet when debugging MacOS problems and that can be a challenge.

I think the problem you are having is one of line-endings in ASCII files, carriage-return line-feed vs. just line-feed, or something like that. (Delightful historical references to electric typewriters!)

Try using diff instead of cmp to compare ASCII files. IIRC it ignores whitespace differences by default, or else as an option to do that.

jbathegit commented 1 year ago

Try using diff instead of cmp to compare ASCII files. IIRC it ignores whitespace differences by default, or else as an option to do that.

Thanks for the suggestion, but unfortunately switching from cmp -s to diff -w doesn't resolve this issue for the outtest5_8 test. And again, it's only for that one particular test and one particular build of that test, which is really odd.

I've been playing with this in a separate sub-branch, and what's interesting is that I'm occasionally getting similar failures with other codes. For example, I've seen outtest6_8 fail in the same way in the developer flow, and I've also seen outtest1_8 fail in the same way in the Intel flow. In those cases, I've been able to re-run the jobs a few times and get them to eventually pass. But for whatever reason I can never achieve a similar result with the outtest5_8 on MacOS, no matter how many times I try to re-run it. And I also can't save the test output aside as an archive to try to look at it offline, because if the test step of the workflow fails, then it never even executes the subsequent upload-artifact step.

So bottom line is I have no idea what's going on here, but I'm starting to think we may have to remove the execute_command_line() stuff and revert to a wrapper script to do the cmp -s for the outtest tests. It would be neat if it worked everywhere, but we seem to have run into a roadblock, and I'm just about out of ideas, and I haven't found any suggestions to resolve this in any online forums.

jbathegit commented 1 year ago

FWIW, in the other sub-branch I was talking about (see #330), I did just try temporarily removing the xrc check from the outtest5_8 code, just so that test would "pass", and that way I could at least see the test output in the upload-artifact step that I'd previously added to the MacOS flow in an earlier commit on that same sub-branch. That worked (i.e. the test "passed"), and I was able to download the test output from the runner, but then when I manually compared it to the baseline OUT_5 output, there were no differences whatsoever, no matter whether I used cmp -s or diff -w to do the comparison!?

So clearly the issue is that, for whatever crazy reason, the execute_command_line() simply generates the wrong value for xrc for this one particular outtest5 code in this one particular _8 build.

jbathegit commented 1 year ago

As a last gasp, I tried printing the value of xrc from within the outtest5_8 test, and I was a bit shocked to see it return a value of 137438953472 as (supposedly) the value returned by the cmp -s out5.txt testfiles/OUT_5 command when run by execute_command_line for that test.

FWIW, 137438953472 also happens to be 2^37, and also the exact number of bytes on a 128 Gb disk. But why that's also coming back as the exitstat argument from execute_command_line is beyond me. And it's not like it's an integer*4 vs. integer*8 issue, because the size of exitstat isn't configurable in the application code and instead seems to be automatic based on whatever flags were used to compile the code. In other words, if I try to explicitly declare xrc as an integer*4 for the outtest5_8 test, I get a compiler error for a type mismatch between exitstat and xrc.

Feel free to take a look at everything I tried in #330 if you want, and let me know if you have any ideas I may not have thought of. But otherwise, I'm ready to admit defeat at this point and just remove all of the execute_command_line calls and revert to a wrapper script to do the cmp -s for all of the outtest codes.

jbathegit commented 1 year ago

This PR is now superseded by #331