NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
40 stars 19 forks source link

Jsw ufbrw testing upgrade #567

Closed jack-woollen closed 4 months ago

jack-woollen commented 4 months ago

In order to update the testing for ufbrw the printout for iprt=2 was reworked to provide more useful information. A change was made in openbf to facilitate having a "silent" reaction from changing iprt through calling openbf. A change was made to inv win to print all results if iprt>=3.

Fixes #362

jbathegit commented 4 months ago

Hey Jack, thanks again for working this issue and helping to improve the overall test coverage! I really appreciate your efforts, and I totally get that maybe there's some details that maybe should only be printed at a very high verbosity level. So if you want to introduce a new IPRT=3 level then I'm fine with that; however, I really think it should be done using the same 'QUIET' mechanism we've been using all this time, rather than via a new 'SILENT' option.

IPRT has always been a sliding verbosity scale from -1 to +2, with the verbosity steadily increasing as IPRT increases. And it's always only ever been changeable via a call to openbf with 'QUIET'. I hear you that maybe you're only interested in the specific output for ufbrw() and don't want to see any of the intervening IPRT<=2 output, but in my opinion it's always been understood that if you ask for a high verbosity level, then you also have to take all of the other print output at the lower verbosity settings that comes with that. In other words, if you want +2, then you also have to take everything at +1 and below that as well, and similarly now if you want +3 then you also have to now take everything at +2 and below. And that's how users have always understood it as well.

So bottom line, I don't like the idea of suddenly now introducing a brand new 'SILENT' option whose only apparent benefit is to bypass that one additional print message in openbf(), and so that +3 now effectively generates slightly less verbose output than +2. That seems confusing and unnecessarily complicated, especially when the cost is that we would now have new lines of library code to maintain along with new corresponding documentation that would also need to be written and added to the openbf() docblock. To me that's not even close to a worthwhile tradeoff.

Again, if you want a new +3 "super verbosity" option on top of the +2 we already have, then I'm fine with that, but only if it's done using the existing 'QUIET' mechanism. Or we could just stick with +2 as the maximum IPRT value that we have now and you could print your ufbrw() output at that level. There's already a ton of stuff that gets printed at +2, so would those extra ufbrw() statements really make that big of a difference?

jack-woollen commented 4 months ago

I removed the silent path from openbf but added iprt=3 as an option. Now all the tests pass except develop has a problem with intest10. I don't understand how this fails in develop and not in any other test set.

  IF(IO.EQ.'QUIET') THEN

c .... override previous IPRT value (printout indicator) IPRTPRV = IPRT IPRT = LUNDX IF(IPRT.LT.-1) IPRT = -1 IF(IPRT.GT. 3) IPRT = 3 IF(IPRT.GE.0) THEN CALL ERRWRT('++++++++++++++BUFR ARCHIVE LIBRARY+++++++++++++++++') WRITE ( UNIT=ERRSTR, FMT='(A,I3,A,A,I3,A)' ) . 'BUFRLIB: OPENBF - DEGREE OF MESSAGE PRINT INDICATOR '// . 'CHNGED FROM',IPRTPRV,CPRINT(IPRTPRV+1),' TO',IPRT,CPRINT(IPRT+1) CALL ERRWRT(ERRSTR) CALL ERRWRT('++++++++++++++BUFR ARCHIVE LIBRARY+++++++++++++++++') CALL ERRWRT(' ') ENDIF ENDIF

jbathegit commented 4 months ago

Hey Jack, I'm busy with a meeting and some other stuff this morning, but I can take a closer look later this afternoon to see if I can help figure out what the problem is.

jbathegit commented 4 months ago

So the problem here is that you have a memory overflow violation. You can see it if you drill down into the developer test details, which also tells you exactly where it's going out-of-bounds:

Capture

Specifically, in line 69 of intest10 it's calling openbf with 'QUIET' and IPRT=3. This was (previously) a test that any IPRT value greater than 3 would automatically be reset back to the (previous) maximum of 2, so that will also now need to be changed in intest10 as part of this PR, but that's a side point.

Anyway, the call stack eventually goes to line 182 of openbf() which includes a print of CPRINT(IPRT+1), and which in turn means it's now trying to print the string in CPRINT(4), which doesn't exist and is why the memory overflow occurs. In other words, now that you're upping the maximum legal value of IPRT to 3, you also need to correspondingly update the size of CPRINT and give it a suitable corresponding definition (e.g. ' (all warning+informational+debug)', or however you want to describe this new "super-verbosity" option?)

And BTW, the reason it only failed in the developer test is because that's the pickiest CI test that runs with strict bounds checking, address sanitization, and a bunch of other flags designed to catch this very sort of thing as well as numerous other potential bugaboos. If you look at the .github/workflows/developer.yml file you can see all of the special additional CMAKE_Fortran_FLAGS and CMAKE_C_FLAGS in play for this particular CI test. In fact, over the past year I've adopted the habit of using those same flags in all of my own local development builds and tests, because again it will catch things that other compilers and lesser flag settings may gloss over. In short, and as Ed has noted, if your build and test can pass with all of those particular GNU compiler flags turned on, then you can feel pretty comfortable that you'll pass any warning or other issue that any other compiler (Intel, MacOS, etc.) might think to squawk about.

jack-woollen commented 4 months ago

I think I made the required changes. Thanks for the help.

jbathegit commented 4 months ago

You're welcome Jack, and thank you (again :-) for your work on this issue. I think we're just about there, but there are still a couple of remaining points from earlier:

  1. Any changes or new lines of code need to be included in the CI testing. Otherwise, we're increasing the number of untested lines of code in the library, which is the exact opposite of the overall goal here. So again, your new if(iprt>=3) block in invwin() needs to be tested, perhaps by just setting the LUNDX argument to 3 within the call to openbf() in line 110 of test_ufbrw.F90(?)
  2. Similarly, now that we're increasing the maximum allowable value of IPRT to 3, then the logic in line 178 of openbf() which resets any user-requested IPRT value higher than that back down to the allowable maximum is no longer being tested either. As I noted earlier, this could be fixed by just increasing the LUNDX argument from 3 to 4 in line 69 of intest10.

I realize this issue has probably taken a lot longer to resolve than you expected, but I also hope it's been helpful to you to better understand some of the details of GitHub and the automated CI testing. The important takeaway is that whenever we add lines of code to the library or intentionally change its behavior in any way, then we need to make the appropriate corresponding changes to the test suite, or else add new test code/cases if necessary. The one thing we don't want to do is add any new code without making sure its being tested somewhere in the suite.

jbathegit commented 4 months ago

Thanks Jack, but now in invwin() you have: 'invwin i1,i2,in ',inv11,inv2,invwin instead of 'invwin i1,i2,in ',inv1,inv2,invwin

You can see the result of this typo in the resulting FT06 output, where a bunch of '****' fields are being printed for the non-existent inv11 value: Capture

If you can fix that then I'll go ahead and approve and merge this PR, and I'll just make that other fix to line 69 of intest10 myself in my next upcoming PR. One last question though - when I merge your PR, is it OK with you if I do it as a squash commit, which basically compresses all 20 of your commits for this PR down into one single "net" commit to the develop branch?

jack-woollen commented 4 months ago

Squashing is okay.