NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
40 stars 19 forks source link

new apxdx utility, more testing, and more F90 #590

Closed jbathegit closed 2 months ago

jbathegit commented 2 months ago

Fixes #585 Part of #254 Part of #328 Part of #445

jbathegit commented 2 months ago

So this was a new one. I always test locally using the same extended flags as in the develop CI test, in order to flag any warnings as errors. But this was the first time something passed under those conditions yet failed under other CI tests with not as strict checking.

It turns out this was due to differences in how unused space within character strings is initialized for compression in wrcmps. In the develop case, there was garbage in the unused space which was causing the (cstr(i).ne.catx(i,j)) test to not always give the correct result when a character string had the same value across multiple subsets, and because of that the resulting output file was much larger than it should have been. The values were encoded properly, but in much more space than necessary, and which in turn caused the cmp -s test to fail for the newest outtest11. I fixed this by explicitly initializing all catx(i,j) values to blank spaces, just like is already done inside of writlc.

jbathegit commented 2 months ago

Following the above fix, an error is now showing up in the macOS CI test that's unrelated to the content of this PR. I've tried re-running it a couple of times to no avail, and in the build step of the output I keep seeing the diagnostic error: extension '_bufrlib' has Fortran sources but no Fortran compiler found followed immediately by another diagnostic Maybe empty "_bufrlib-f2pywrappers.f" when it's trying to generate the Python binding.

The main Fortran part of the code seems to be building OK, so it's not like there's no Fortran compiler available in the macOS runner, and apparently this is only a problem when it's trying to build the wrappers for Python.

@AlexanderRichert-NOAA @edwardhartnett @aerorahul have any of you ever encountered this before?

AlexanderRichert-NOAA commented 2 months ago

I haven't, but I can fix it in my fork by explicitly setting the numpy version to 1.26.3 (previous version): pip3 install numpy -> pip3 install numpy==1.26.3. I'll file a bug with the numpy people, and I'll let you know if they have a better solution.

jbathegit commented 2 months ago

Thanks for the suggestion @AlexanderRichert-NOAA, but as you can see I'm still getting that same error in the MacOS GitHub runner even after making that fix. So something else still needs to be tweaked.

I did go ahead and update all of the workflow actions from v3 to v4, based on some deprecation warnings that had also recently started showing up in the Summary logs. But again, we're still stuck on this issue where it apparently can't build Fortran wrappers for Python in the MacOS runner.

AlexanderRichert-NOAA commented 2 months ago

Okay, I think I spoke too soon-- It looks like I can get it to work by adding F90: gfortran-11 after 'FC' in the 'env' settings (and I can even take out the existing ln -sf ... gfortran and revert to using the latest numpy). I'm guessing they changed something in numpy about which variables/settings/defaults get used to determine the compiler...

jbathegit commented 2 months ago

Thanks @AlexanderRichert-NOAA for the continued suggestions, but it's still not working. We seem to have cleared the Python hurdle, but now I'm getting a new error within the Fortran build of debufr.F90 on MacOS, because it's failing now in test_debufr

92:  ***********BUFR ARCHIVE LIBRARY ABORT**************
92:  BUFRLIB: GETCFMNG - TO USE THIS SUBROUTINE, MUST FIRST CALL SUBROUTINE CODFLG WITH INPUT ARGUMENT SET TO Y
92:  ***********BUFR ARCHIVE LIBRARY ABORT**************
92:   
92: STOP 8
 92/101 Test  #92: test_debufr ......................***Failed    0.03 sec

which is nonsense b/c the utility tests fine everywhere else. I've tried a few different variations of your suggested fixes, but still no luck. And again, I really think the above error is a red herring.

jbathegit commented 2 months ago

@AlexanderRichert-NOAA @edwardhartnett @aerorahul @jack-woollen

Please, I need some help to get this particular PR across the finish line, because of the Python build error that suddenly appeared for the MacOS CI. The latest MacOS.yml tweak to the Fortran specs seemed to get around that, but at the cost of a different issue now popping up in a separate Fortran utility. And I really think that other issue is a red herring, because (1) there were no changes made to that particular utility, and (2) it's not flagging as an error on any other platform.

Unfortunately I don't have access to a MacOS machine on my end, so I don't have any way to try to reproduce that error locally. The only thing I could do is just continue pushing up incremental MacOS.yml changes as follow-on commits, on a trial-and-error basis to see if I can manage to get anything to work. But that seems really inefficient and not an optimal way to try to resolve this.

@AlexanderRichert-NOAA did you ever hear anything back from the numpy folks you reached out to, or any other suggestions how to make the Python build work on MacOS without adversely impacting other unrelated Fortran features that were previously working fine?

Thanks for any help, b/c I'm still a bit stuck here.

jbathegit commented 2 months ago

Hey @rmclaren, Rahul suggested you might be able to help out here as well. To recap from above, and since the introduction of commit 464fdb8 to this PR, there's now a Python build error that has cropped up in the MacOS CI test for NCEPLIBS-bufr, and which in turn has halted any further library development dead in its tracks.

Have you ever encountered anything similar in trying to build Python wrappers for the Fortran portion of NCEPLIBS-bufr on MacOS? I'm kind of stuck here b/c I don't have my own local access to a MacOS platform that I can use to try to further debug what may be really going on here. As you can see above, I've already tried a few suggested tweaks by @AlexanderRichert-NOAA to the .github/workflows/MacOS.yml file in an effort to try to get around this, but without much success. And this is the only platform where the CI tests are now failing for this apparent reason.

AlexanderRichert-NOAA commented 2 months ago

For what it's worth, I don't think the failure is Python related. I did a run of the workflow with -DENABLE_PYTHON=OFF, and it still fails on the same test, test_debufr.

jbathegit commented 2 months ago

Thanks for that input Alex, which seems like an important clue. But were you testing on the latest version of MacOS.yml which includes the new F90: gfortran-11 line that you'd previously suggested, or the original version which didn't include that line?

AlexanderRichert-NOAA commented 2 months ago

I'm using the latest copy of your branch as of a few minutes ago, though this same test was failing previously, so I don't think the 'F90' line makes any difference.

jbathegit commented 2 months ago

The F90 line does make a difference, because without it then it goes back to failing in the build of the Python wrappers (see my latest commit). So I guess I'll put that back in, but would it be possible for someone with a MacOS to then do a build with -DCMAKE_BUILD_TYPE=Debug turned on to help me take a look and figure out why test-debufr is now suddenly failing just on that platform?

jbathegit commented 2 months ago

would it be possible for someone with a MacOS to then do a build with -DCMAKE_BUILD_TYPE=Debug turned on to help me take a look and figure out why test-debufr is now suddenly failing just on that platform?

Otherwise, the only other option I can think of would be to put a bunch of temporary print statements into the debufr script and code to try to decipher what's happening here. So a lot more commits and not very efficient, but maybe the only way to get to the bottom of this. Again, this is really puzzling b/c I didn't change any of that code, and it's not failing anywhere else.

jbathegit commented 2 months ago

And now that I just said that, the "Spack / Spack (ubuntu-latest, +python ~shared) (pull_request) " test is now suddenly failing, apparently with some different sort of Python diagnostic.

One step forward, two steps back (sigh)

jbathegit commented 2 months ago

OK, the MacOS hurdle is finally cleared after a bunch of temporary print statements and trial-and-error commits in a separate test branch. Apparently whatever newer compiler they're using now wasn't real happy about how single character values were being passed between C and Fortran, so I re-worked how that was being done.

So all good there, but as mentioned earlier there's now a new Python error that's cropping up in the "Spack / Spack (ubuntu-latest, +python ~shared) (pull_request) " build, so I'd appreciate if @AlexanderRichert-NOAA or someone else could help me with this:

     550    /home/runner/work/NCEPLIBS-bufr/NCEPLIBS-bufr/spack/opt/spack/linux
            -ubuntu22.04-zen2/gcc-11.4.0/python-venv-1.0-t4pp2wmjsfgsxkaqjkwcpz
            vv53xopi7j/bin/python3: Error while finding module specification fo
            r 'numpy.f2py' (ModuleNotFoundError: No module named 'numpy')
  >> 551    make[2]: *** [python/CMakeFiles/python_mod.dir/build.make:73: pytho
            n_mod] Error 1
     552    make[2]: Leaving directory '/tmp/runner/spack-stage/spack-stage-buf
            r-develop-u4qvv2nxnmlmiuv3raa5ecq77snqtzeh/spack-build-u4qvv2n'
  >> 553    make[1]: *** [CMakeFiles/Makefile2:1464: python/CMakeFiles/python_m
            od.dir/all] Error 2
AlexanderRichert-NOAA commented 2 months ago

The issue seems to be related to a recent change in Spack. While I work with the Spack devs to figure out the exact issue, for now, using the most recent stable release works. Trying adding -b releases/v0.21 to the end of the git clone -c feature.manyFiles=true https://github.com/spack/spack line in Spack.yml and that should do it.

jbathegit commented 2 months ago

Thanks Alex - much appreciated!

Everyone, this PR is now (finally :-) ready for review.