NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
46 stars 21 forks source link

Use F90 interfaces and kinds to handle 4/8/d build issues, and only build one copy of the library #78

Closed edwardhartnett closed 1 year ago

edwardhartnett commented 3 years ago

This will come after #77

Currently we build 3 flavors of the library, 4, 8, and D. Instead, use fortran interfaces and kinds to provide one library that handles all three cases.

Not sure when this should happen, we want to try this out on a simpler project first.

edwardhartnett commented 3 years ago

@jbathegit when do we think this issue will be addressed?

jbathegit commented 3 years ago

@edwardhartnett I honestly have no idea. I haven't had any time to work on NCEPLIBS-bufr updates in the past couple of months, and I'm not sure how soon I'll be able to get back to it. I'd certainly welcome any help if someone else wants to work on this in the meantime.

jbathegit commented 2 years ago

@edwardhartnett @aerorahul @rmclaren @kgerheiser @jack-woollen

I've spent a bit of time looking into this. I've come up with a prototype that will work, but there are a couple of issues as noted below. For example, let's say I have an existing Fortran-77 library routine named writsa which has the following call signature:

call writsa( lunxx, lmsgt, msgt, msgl, xval, yval)

and where all of the arguments are currently implicitly typed (so the first four are integers and the last two are reals), and the third argument (msgt) is an array of size lmsgt. The goal here is to develop a single interface that can automatically handle any of 3 cases that might be passed in by a user:

So I could pick, say, the second case as the new default (meaning that, going forward, the only compiled version of the library would be equivalent to the existing _8 build) and have something like the following:

module modi_writsa

    interface writsa
        module procedure writsa_4, writsa_8, writsa_d
    end interface

    contains

    subroutine writsa_4( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=4), intent(in) :: xval
        real(kind=4), intent(out) :: yval

        integer my_lunxx, my_lmsgt, my_msgl, ierr
        integer, allocatable :: my_msgt(:)
        real :: my_xval, my_yval

        my_lunxx = lunxx
        my_lmsgt = lmsgt
        my_xval = xval
        allocate(my_msgt(lmsgt),stat=ierr)
        if (ierr .ne. 0 ) then
          print *, 'allocate error in writsa_4'
          return
        endif

        call writsa_f ( my_lunxx, my_lmsgt, my_msgt, my_msgl, my_xval, my_yval )

        msgt(1:lmsgt) = my_msgt(1:lmsgt)
        msgl = my_msgl
        yval = my_yval

        return
    end subroutine

    subroutine writsa_8( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(kind=8), intent(in) :: lunxx
        integer(kind=8), intent(in) :: lmsgt
        integer(kind=8), intent(out) :: msgt(:)
        integer(kind=8), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

        call writsa_f ( lunxx, lmsgt, msgt, msgl, xval, yval )

        return
    end subroutine

    subroutine writsa_d( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

        integer my_lunxx, my_lmsgt, my_msgl, ierr
        integer, allocatable :: my_msgt(:)

        my_lunxx = lunxx
        my_lmsgt = lmsgt
        allocate(my_msgt(lmsgt),stat=ierr)
        if (ierr .ne. 0 ) then
          print *, 'allocate error in writsa_d'
          return
        endif

        call writsa_f ( my_lunxx, my_lmsgt, my_msgt, my_msgl, xval, yval )

        msgt(1:lmsgt) = my_msgt(1:lmsgt)
        msgl = my_msgl

        return
    end subroutine

  end module

I would then rename the existing subroutine writsa as writsa_f, and that way users can still call the same name writsa as before, but now it will automatically re-direct them through the proper interface procedure (either writsa_4, writsa_8, or writsa_d, depending on how they have the variables declared within their own application code), and where each of these would then act as a conduit into the existing subroutine (now renamed as writsa_f) and where the call signature there is now clearly defined with all of the arguments being 8-byte integers and 8-byte reals as the default build of the library. I've tested this out and it works;

HOWEVER...

the main issue I see here is one of efficiency; specifically, in the writsa_4 and writsa_d cases, I have to copy all of the 4-byte values input by the user into corresponding 8-byte local variables, then call the single writsa_f routine with those 8-byte variables, then copy the output back into the original 4-byte values. This is even more complicated by the fact that one of the variables (msgt) is an array, which means I need to dynamically allocate the correct amount of space at run time in order to do the transfers. This seems like a lot of extra overhead, especially if we have to dynamically allocate temporary memory every time the subroutine is called. Is there maybe a cleaner way to do this that I'm missing here?

A secondary issue is that this, or whatever approach we end up using, will need to be replicated across hundreds of existing subroutines and functions in order to end up with a single build of the library that handles all of the existing _4, _8, and _d cases, plus all of the accompanying documentation, etc. Again, unless I'm missing something here that would make this process a lot more straightforward, this will be a lengthy task.

I'd greatly appreciate any comments, thoughts, advice, suggestions, etc. - thanks!

kgerheiser commented 2 years ago

As you note there's no silver bullet when doing this in Fortran. Hopefully, a native method will be included in the next Fortran standard.

But for now another way to overload the method would be to overload writsa_f directly eliminating the need to copy data and then call the function. To do this directly in Fortran you would have to copy and paste all the code in three different places.

A method I've used before to reduce the code duplication is to use the pre-processor to #include the body of the function and the kinds:

#define INT_T int32
#define REAL_T real32
subroutine writsa_4( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(INT_T), intent(in) :: lunxx
        integer(INT_T), intent(in) :: lmsgt
        integer(INT_T), intent(out) :: msgt(:)
        integer(INT_T), intent(out) :: msgl
        real(REAL_T), intent(in) :: xval
        real(REAL_T), intent(out) :: yval
#include "writsa_body.f"
    end subroutine
#undef INT_T
#undef REAL_T

#define INT_T int32
#define REAL_T real64
subroutine writsa_d( lunxx, lmsgt, msgt, msgl, xval, yval )
        implicit none

        integer(INT_T), intent(in) :: lunxx
        integer(INT_T), intent(in) :: lmsgt
        integer(INT_T), intent(out) :: msgt(:)
        integer(INT_T), intent(out) :: msgl
        real(REAL_T), intent(in) :: xval
        real(REAL_T), intent(out) :: yval
#include "writsa_body.f"
    end subroutine
#undef INT_T
#undef REAL_T

But then you split up the function body and interface definition and rely on the pre-processor. Or it could be automated with the build system.

With bufr you would have to start at the lowest level functions and work your way up.

aerorahul commented 2 years ago

As @kgerheiser notes, Fortran does not support templates. Another possible approach would be to write the main code in a language that supports templates e.g. C++ and provide Fortran interfaces.

rmclaren commented 2 years ago

Ive never used it but fypp (https://fypp.readthedocs.io/en/stable/fypp.html) seems to be a popular way to address these kinds of issues. I noticed that some JCSDA code uses it. Its a little ugly in my opinion but anyways...

edwardhartnett commented 2 years ago

I'm not in favor of bringing in any new tools like fypp. In fact, I strongly oppose the idea.

There is always a strong urge to attempt to change Fortran in some way to overcome it's many shortcomings. (For example the UFS apparently has a home-rolled python to Fortran translator). In my experience, none of these band-aids on Fortran bring long-term benefit to the organization.

For example, we would not be very delighted to have to be using one of the many pre-processors I remember from Fortran 77 days, which gave F77 programmers new and wonderful innovations, now long-since forgotten. Will programmers at EMC in 25 years want to deal with fypp, our home-rolled python translator, or any other hack we make to Fortran to try and make it more easy to program?

Fortran is a long-term standard, but these other tools are local, temporary, and not really in it for the long haul. As the programmers that work on them retire or move on to new projects, these tools will be abandoned. But standard Fortran will march on, and NOAA will still want to run the millions of lines of existing Fortran code it has developed at great cost.

Future NOAA programmers, cursing our shortsightedness, will simply be removing these band-aids and returning to standard Fortran (or C/C++).

edwardhartnett commented 2 years ago

I would be in favor of moving the core of this library (in a gradual way) to C. I think C++ is not really needed, and is much harder for Fortran programmers to learn and consequently harder to staff.

(It should not be assumed that any Fortran programmer can instantly learn either C or C++. There is a steep learning curve, especially when it comes to pointers and object oriented programming, which can be real stumbling blocks. Training and help is required. Also the programmer must be willing to put in the effort.)

I also think this problem could be solved in pure Fortran by making the functions all accept double-precision, and then having single precision calls which cast/copy the data to double-precision. Since no calculations are done by the library (i.e. no hairy physics equations being solved) having single precision vs. double is not going to affect performance.

@jbathegit do you know C or C++?

The way to solve this in C would be first to write a function using parameters and void pointers, which could take any of the data sizes needed, then write simple C wrappers for each actual case needed (i.e. one for single precision one for double), then write Fortran APIs to call those C wrappers. This way the void pointer is hidden from the Fortran layer, and it just sees C functions with nice, well-defined types. All of this is 100% standard C and Fortran.

jbathegit commented 2 years ago

Thanks everyone for your feedback and suggestions!

First of all, and to answer the question from @edwardhartnett, yes I'm proficient in C, but not so much in C++.

Secondly, a question for @rmclaren - in your example writsa_d, how does any actual space for the local variable _msgt get allocated? I see where you're declaring it as an array, but I don't see where you've actually allocated it to be a certain size, so how can _writsa safely write into that space without risking a segfault? Unless I'm missing something(?), this ends up being the same problem I had in my original writsa_d code (see above), where I needed to dynamically allocate my_msgt every time the function was called.

I was able to get around that by using the suggestion from @kgerheiser to incorporate an #include directive in a source file modi_writsa.F90:

module modi_writsa

    interface writsa
        module procedure writsa_4, writsa_8, writsa_d
    end interface

    contains

    subroutine writsa_4( lunxx, lmsgt, msgt, msgl, xval, yval )

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=4), intent(in) :: xval
        real(kind=4), intent(out) :: yval

#include "writsa_body"

    end subroutine

    subroutine writsa_8( lunxx, lmsgt, msgt, msgl, xval, yval )

        integer(kind=8), intent(in) :: lunxx
        integer(kind=8), intent(in) :: lmsgt
        integer(kind=8), intent(out) :: msgt(:)
        integer(kind=8), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

#include "writsa_body"

    end subroutine

    subroutine writsa_d( lunxx, lmsgt, msgt, msgl, xval, yval )

        integer(kind=4), intent(in) :: lunxx
        integer(kind=4), intent(in) :: lmsgt
        integer(kind=4), intent(out) :: msgt(:)
        integer(kind=4), intent(out) :: msgl
        real(kind=8), intent(in) :: xval
        real(kind=8), intent(out) :: yval

#include "writsa_body"

    end subroutine

end module

But, as he noted, this separates the function body from the interface definition and relies on pre-processing, which in turn becomes a hairy mess for Doxygen to make sense of. Furthermore, while I was able to get a simple test code to compile and run via:

ifort -r8 -i8 modi_writsa.F90 test.f90

from the UNIX command line, I couldn't then get the same modi_writsa.F90 code to work within the existing CMake build procedure for NCEPLIBS-bufr; instead, it squawked with the following diagnostic:

undefined reference to writsa_

So I'm still kind of stuck here.

Please note that, whatever solution we come up, it has to work for any user and existing application code simply by calling the name writsa; otherwise, if we change that interface in any way, then every existing user has to correspondingly change his or her own application code, which is something we definitely want to avoid ;-)

jbathegit commented 2 years ago

@edwardhartnett @kgerheiser @rmclaren @aerorahul @jack-woollen

Digging further, the undefined reference is due to the writsa reference being defined in the interface statement, which in turn is defined inside of a module, so I just have a use modi_writsa to bring it into scope within my simple test program. But when I'm building the NCEPLIBS-bufr library via CMake, there's no corresponding main program, and therefore no way I can think of to make the writsa reference visible within the library itself, aside from requiring every existing user to add use modi_writsa within their application code.

Am I missing something here? In other words, is there some straightforward way that I can make the writsa reference in the above interface statement visible throughout the compiled library so that an end user can see it by just linking the compiled library, and without requiring them to have to specify use modi_writsa in their application code?

kgerheiser commented 2 years ago

There has to be a use <module> somewhere. For users there would be a single public facing module, which re-exports all the internal modules, like:

module bufr_mod
use modi_writsa
use ...
use ...
end module bufr_mod

And internally, each piece of code would use the individual modules that it needs.

jbathegit commented 2 years ago

Thanks @kgerheiser, and yeah I did think of something like that, but I was hoping there might be some way for every user to not have to add any such new directive to their application code, or even worse within every individual source file of their code where any NCEPLIBS-bufr routine ever gets called.

I guess I was hoping there might be some sort of generic Fortran structure that I was overlooking and that would automatically get compiled and be in scope within the library without users having to modify their own application code in order to have visibility to that interface(?)

Anyway, thanks again for the confirmation, and if anyone else can think of a separate way around this, please let me know.

I guess my other question is a more general one for @edwardhartnett, in terms of how important this is? I get that it would be nice to be able to have one version of the compiled library that all application codes can link to (vs. separate _4, _8 and _d builds), but this is quite a bit of restructuring just for one subprogram out of many, and we haven't even addressed the issue of how Doxygen is going to behave with all of these #include *body directives linking in all of the various code snippets from all over the place. And converting any of these interfaces to C language would be just as time-consuming, so in the long run is it really worth doing all of this? In other words, are we getting enough proverbial bang-for-the buck from this time/resource investment?

jbathegit commented 2 years ago

Another issue I see with a single public-facing Fortran module such as bufr_mod is that any user trying to access NCEPLIBS-bufr from within application code written in C or Python would be out of luck. In other words, how are they going to be able to add a use bufr_mod to their application code if their code isn't Fortran?

kgerheiser commented 2 years ago

@jbathegit you can still interface with C by using bind(c) and iso_c_binding types.

jbathegit commented 2 years ago

@jbathegit you can still interface with C by using bind(c) and iso_c_binding types.

I'm confused. Given the above example module modi_writsa, how would I reference the name writsa from within C code? As far as I know, I can't add the bind(c) attribute to a name defined in a Fortran interface statement.

In other words, interface writsa bind(c) doesn't work, so how would I do this?

edwardhartnett commented 2 years ago

How important is this? I would say quite important.

Right now we are doing this 4/8/d thing, which is a NOAA special - I've never seen anyone else do this or even try it. So we are well outside the normal with this.

Heading further down the road with 4/8/d is just putting off the problem. I have outlined a solution in C - can this work?

aerorahul commented 2 years ago

Before this library is refactored/rewritten, what does the future use of BUFR look like in our systems and in general with the development around us? Who are the primary users of this and the new library? What are their needs? Is this library needed in the future or can it be combined with other existing solutions?

edwardhartnett commented 2 years ago

I have heard the desire to phase out GRIB1 (although this still has not been achieved). There has been no talk of getting rid of BUFR, has there?

Our goal is to have world-class software so scientists can do world-class science. We will do whatever programming work we need to do, in order to achieve that.

If that means extra programming... well, we are programmers. It's what we do all day, every day.

image

aerorahul commented 2 years ago

BUFR is not going anywhere. I am talking about use of bufr, not about its existence.

edwardhartnett commented 2 years ago

OK, so this is our bufr solution for NOAA for the next several decades.

At the moment I am working hard on the GRIB libraries, and that will take some time. @jbathegit if you want to defer this entire conversation until I have free time to come help NCEPLIBS-bufr with this conversion away from 4/8/d, that's fine.

We are doing this for other libraries, and will learn as we go. So let's table this issue for a while until we have more experience.

Jeff, if you want to try out the C solution, I think that's where we will end up heading.

jack-woollen commented 2 years ago

I'm wondering how many codes use the 8 or the d version. And why.

On Mon, Jan 24, 2022 at 9:02 AM Edward Hartnett @.***> wrote:

I have heard the desire to phase out GRIB1 (although this still has not been achieved). There has been no talk of getting rid of BUFR, has there?

Our goal is to have world-class software so scientists can do world-class science. We will do whatever programming work we need to do, in order to achieve that.

If that means extra programming... well, we are programmers. It's what we do all day, every day.

[image: image] https://user-images.githubusercontent.com/38856240/150796455-ccab7308-6287-4d0c-b5cd-a167d785b853.png

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/78#issuecomment-1020131358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6M757VQL6ZCSPLWOK3UXVLVVANCNFSM4TYNIUTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

jbathegit commented 2 years ago

@aerorahul raises a good question that I've raised myself in the past, which is what the future of BUFR looks like in our systems? He's correct that BUFR isn't going anywhere, and in fact we'll definitely continue receiving BUFR observational data from the outside world for many years/decades into the future. So we're always going to need to have software to be able to read and write BUFR when interacting with the rest of the world, because it is and will remain a WMO standard for a long time, and there's been no discussion whatsoever at the WMO level about changing that.

But as I've mentioned previously, and just because we'll have a continued need to be able to read/decode and write/encode BUFR in our interactions with the rest of the world, that doesn't mean we necessarily need to forever continue to use BUFR as our internal NCEP storage and archive format. In other words, there's no reason many of our internal codes couldn't switch to something else (direct JEDI/IODA array vectors?) if that were deemed preferable, and in that case it would greatly reduce the number of NCEP codes that we have to support with our BUFR software. But of course that's an enterprise-level decision that would have far-reaching implications into a lot of existing software and other internal practices. So, again, just throwing that out there as food-for-thought.

For my part, and to answer @jack-woollen's question, I have no idea how many existing codes or what percentage use the 8 or the d versions. I would imagine it's a fairly small number, because most BUFR values fit just fine into 32 bit fields. That said, there are a handful of newer BUFR descriptors whose range of values don't fit into 32 bits and therefore do require 8 bytes, and which is why I've suggested making that the new default and just tailoring all newly-developed interfaces to that build.

jack-woollen commented 2 years ago

The bufrlib inputs and outputs are all already 8 bytes. You don't need different libraries for that.

On Mon, Jan 24, 2022 at 1:11 PM Jeff Ator @.***> wrote:

@aerorahul https://github.com/aerorahul raises a good question that I've raised myself in the past, which is what the future of BUFR looks like in our systems? He's correct that BUFR isn't going anywhere, and in fact we'll definitely continue receiving BUFR observational data from the outside world for many years/decades into the future. So we're always going to need to have software to be able to read and write BUFR when interacting with the rest of the world, because it is and will remain a WMO standard for a long time, and there's been no discussion whatsoever at the WMO level about changing that.

But as I've mentioned previously, and just because we'll have a continued need to be able to read/decode and write/encode BUFR in our interactions with the rest of the world, that doesn't mean we necessarily need to forever continue to use BUFR as our internal NCEP storage and archive format. In other words, there's no reason many of our internal codes couldn't switch to something else (direct JEDI/IODA array vectors?) if that were deemed preferable, and in that case it would greatly reduce the number of NCEP codes that we have to support with our BUFR software. But of course that's an enterprise-level decision that would have far-reaching implications into a lot of existing software and other internal practices. So, again, just throwing that out there as food-for-thought.

For my part, and to answer @jack-woollen https://github.com/jack-woollen's question, I have no idea how many existing codes or what percentage use the 8 or the d versions. I would imagine it's a fairly small number, because most BUFR values fit just fine into 32 bit fields. That said, there are a handful of newer BUFR descriptors whose range of values don't fit into 32 bits and therefore do require 8 bytes, and which is why I've suggested making that the new default and just tailoring all newly-developed interfaces to that build.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/78#issuecomment-1020394123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6PG2T6ZSIUMDSOZLOTUXWI5TANCNFSM4TYNIUTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

jbathegit commented 2 years ago

The bufrlib inputs and outputs are all already 8 bytes. You don't need different libraries for that.

@jack-woollen please clarify what you mean by this. Specifically, when a user application calls, e.g. SUBROUTINE WRITSB(LUNIT), how does the subroutine know whether LUNIT was declared (or compiled) as a 4-byte or 8-byte integer in the calling program? This isn't mandated in the library itself, which just lets it be implicitly declared as an integer. Similarly, when an application code calls FUNCTION IUPBS01(MBAY,S01MNEM), the array MBAY is just implicitly declared internally as DIMENSION MBAY(*), so how does the function know whether each array member is 4 or 8 bytes long in the calling program? Since we never defined these interfaces explicitly in the past, that's part of the reason we now have this ambiguity.

Unless I'm missing something, the only things that are explicitly declared as 8 bytes within the library itself are the REAL values arrays that get passed into and out of subroutines such as UFBINT, UFBREP, etc. Pretty-much every other integer or real call parameter to the library doesn't have an explicit length specified.

jack-woollen commented 2 years ago

@Jeff Ator - NOAA Federal @.***>

Okay, two different things.

1) When you say " there are a handful of newer BUFR descriptors whose range of values don't fit into 32 bits and therefore do require 8 bytes, and which is why I've suggested making that the new default and just tailoring all newly-developed interfaces to that build", I'm just pointing out that the bufr range of values which need 8 bytes to fit in are already taken care of by the 8byte arrays which go into or out of the bufrlib calls in each of the libraries now, unless I'm missing something. And by the way, if you are suggesting only maintaining bufrlib_d or bufrlib_8, it seems to me that makes more trouble than it saves. I think bufrlib_4 is the one to focus on.

2) As for the unit numbers and dimensions, return codes, etc, they can be explicitly defined as 4 bytes in "pure" 8 byte app codes. They will probably never need 8 bytes themselves. The mbay array does have an allocated dimension, which I assume could be used instead of '*'. Bottom line, I think that reduction to just the 4 byte bufrlib would require a smaller amount of changes, in a smaller number of codes, than other alternatives which have been discussed.

On Mon, Jan 24, 2022 at 2:39 PM Jeff Ator @.***> wrote:

The bufrlib inputs and outputs are all already 8 bytes. You don't need different libraries for that.

@jack-woollen https://github.com/jack-woollen please clarify what you mean by this. Specifically, when a user application calls, e.g. SUBROUTINE WRITSB(LUNIT), how does the subroutine know whether LUNIT was declared (or compiled) as a 4-byte or 8-byte integer in the calling program? This isn't mandated in the library itself, which just lets it be implicitly declared as an integer. Similarly, when an application code calls FUNCTION IUPBS01(MBAY,S01MNEM), the array MBAY is just implicitly declared internally as DIMENSION MBAY(*), so how does the function know whether each array member is 4 or 8 bytes long in the calling program? Since we never defined these interfaces explicitly in the past, that's part of the reason we now have this ambiguity.

Unless I'm missing something, the only things that are explicitly declared as 8 bytes within the library itself are the REAL values arrays that get passed into and out of subroutines such as UFBINT, UFBREP, etc. Pretty-much every other integer or real call parameter to the library doesn't have an explicit length specified.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/78#issuecomment-1020472360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6KGXBGUWWYQUVC2LW3UXWTFJANCNFSM4TYNIUTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

CoryMartin-NOAA commented 2 years ago

just because we'll have a continued need to be able to read/decode and write/encode BUFR in our interactions with the rest of the world, that doesn't mean we necessarily need to forever continue to use BUFR as our internal NCEP storage and archive format. In other words, there's no reason many of our internal codes couldn't switch to something else (direct JEDI/IODA array vectors?) if that were deemed preferable, and in that case it would greatly reduce the number of NCEP codes that we have to support with our BUFR software. But of course that's an enterprise-level decision that would have far-reaching implications into a lot of existing software and other internal practices.

Even if we still use BUFR files for archival/direct input to JEDI/IODA, that should still greatly simplify the codes that use BUFR libraries, no? We are already in the process of removing preprocessing (prepobs, etc.) code to put everything in JEDI UFO, so I think it's important to look at this with the lens of one unified software system (written in C++) will handle observation ingest once legacy systems are retired. The only other use that I could think of would be verification/MET+. Tagging @ADCollard @emilyhcliu for awareness/input.

CoryMartin-NOAA commented 2 years ago

Also to comment on @edwardhartnett 's quote "OK, so this is our bufr solution for NOAA for the next several decades." I think that is a chilling thought. This current BUFR library has pieces old enough to run for senate. This warrants thought on what the BUFR library at NCEP will look like in 2040.

jbathegit commented 2 years ago

1) When you say " there are a handful of newer BUFR descriptors whose range of values don't fit into 32 bits and therefore do require 8 bytes, and which is why I've suggested making that the new default and just tailoring all newly-developed interfaces to that build", I'm just pointing out that the bufr range of values which need 8 bytes to fit in are already taken care of by the 8byte arrays which go into or out of the bufrlib calls in each of the libraries now, unless I'm missing something. And by the way, if you are suggesting only maintaining bufrlib_d or bufrlib_8, it seems to me that makes more trouble than it saves. I think bufrlib_4 is the one to focus on.

One issue is that, even though the REAL*8 USR arrays are explicitly declared as 8-byte, a lot of the behind-the-scenes stuff is still just implicitly-defined integers. For example, in subroutines RDTREE and WRTREE, any such REAL*8 values are unpacked and packed internally via the implicitly-defined IVAL array, so in 4-byte integer builds we still can't handle any value that needs more than 32 bits, even if the USR arrays themselves are REAL*8. In this case, I could easily change moda_ival.F to explicitly declare IVAL as INTEGER*8 or INTEGER(kind=8), but there are probably numerous other such bugaboos sprinkled throughout the library as well, and which is why I really think bufrlib_8 is the build to focus on, not bufrlib_4.

2) As for the unit numbers and dimensions, return codes, etc, they can be explicitly defined as 4 bytes in "pure" 8 byte app codes. They will probably never need 8 bytes themselves. The mbay array does have an allocated dimension, which I assume could be used instead of '*'. Bottom line, I think that reduction to just the 4 byte bufrlib would require a smaller amount of changes, in a smaller number of codes, than other alternatives which have been discussed.

Maybe I'm not understanding you correctly, but it sounds like you're suggesting the onus in the future should be on all application codes to explicitly declare whether or not all of their integers are 4 or 8 bytes. I think the goal should be to not force every user to have to make wholesale changes to his/her application codes, but rather to handle such variances internally within a single build of the library itself. In my mind that's a smaller overall impact, at least in terms of overall amount of source code that would potentially need to be modified.

jbathegit commented 2 years ago

Also to comment on @edwardhartnett 's quote "OK, so this is our bufr solution for NOAA for the next several decades." I think that is a chilling thought. This current BUFR library has pieces old enough to run for senate. This warrants thought on what the BUFR library at NCEP will look like in 2040.

I hear you @CoryMartin-NOAA, and a completely new library could certainly be developed in a more modern language and with a more modern design, so I'll reply by just making the same point I made several months ago when someone else suggested the same thing. Simply put, and before we collectively go down that road, just please don't underestimate the amount of effort that would be involved in developing a brand new BUFR library from scratch, and which has the same features and capabilities that our users rely on within the current library. BUFR is a very involved format with a lot of nuances, so we all (and NCEP upper management as well! ;-) would need to be on board with an undertaking of that magnitude.

jack-woollen commented 2 years ago

After some dinner and a little more thought I did realize the real numbers in the bufrlib internals should all be 8 bytes. But, I don't think that's equally true for integers, so I'd change my focus to bufrlib_d. In terms of numbers of codes that wouldn't need to change, bufrlib_d would likely work as is with all current bufrlib_4 users as well as all current bufrlib_d users. That has to be the largest number of app codes by far.

My point 2, rephrased for a bufrlib_d focus, is just that any codes that now use bufrlib_8 would only have to specify integer arguments to bufrlib routines as integer(4), which doesn't seem to be too great a burden. As you say there are some integers internal to bufrlib that should be fixed to have 8 bytes. I'm happy to look for all bugaboos involving 4 byte integers which should be specified as 8 byte integers, because I don't think there are many of them, and they should all be identified in any case.

Or, there may even be a way to make the user callable routines insensitive to the integer argument sizes. I can think of some possibilities on that, which if it worked, could mean a modified version of bufrlib_8 could replace all current versions without any app code changes at all. That would be the best.

On Mon, Jan 24, 2022 at 5:56 PM Jeff Ator @.***> wrote:

  1. When you say " there are a handful of newer BUFR descriptors whose range of values don't fit into 32 bits and therefore do require 8 bytes, and which is why I've suggested making that the new default and just tailoring all newly-developed interfaces to that build", I'm just pointing out that the bufr range of values which need 8 bytes to fit in are already taken care of by the 8byte arrays which go into or out of the bufrlib calls in each of the libraries now, unless I'm missing something. And by the way, if you are suggesting only maintaining bufrlib_d or bufrlib_8, it seems to me that makes more trouble than it saves. I think bufrlib_4 is the one to focus on.

One issue is that, even though the REAL8 USR arrays are explicitly declared as 8-byte, a lot of the behind-the-scenes stuff is still just implicitly-defined integers. For example, in subroutines RDTREE and WRTREE, any such REAL8 values are unpacked and packed internally via the implicitly-defined IVAL array, so in 4-byte integer builds we still can't handle any value that needs more than 32 bits, even if the USR arrays themselves are REAL8. In this case, I could easily change moda_ival.F to explicitly declare IVAL as INTEGER8 or INTEGER(kind=8), but there are probably numerous other such bugaboos sprinkled throughout the library as well, and which is why I really think bufrlib_8 is the build to focus on, not bufrlib_4.

  1. As for the unit numbers and dimensions, return codes, etc, they can be explicitly defined as 4 bytes in "pure" 8 byte app codes. They will probably never need 8 bytes themselves. The mbay array does have an allocated dimension, which I assume could be used instead of '*'. Bottom line, I think that reduction to just the 4 byte bufrlib would require a smaller amount of changes, in a smaller number of codes, than other alternatives which have been discussed.

Maybe I'm not understanding you correctly, but it sounds like you're suggesting the onus in the future should be on all application codes to explicitly declare whether or not all of their integers are 4 or 8 bytes. I think the goal should be to not force every user to have to make wholesale changes to his/her application codes, but rather to handle such variances internally within a single build of the library itself. In my mind that's a smaller overall impact, at least in terms of overall amount of source code that would potentially need to be modified.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/78#issuecomment-1020631854, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6NHEA3KBLMZE2T5H5DUXXKHXANCNFSM4TYNIUTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

edwardhartnett commented 2 years ago

Wow, lots of great discussion here. To address a few strategic points:

  1. Switching applications from BUFR to netCDF would make sense. Switching from BUFR to some Jedi specific proprietary format would be a step in the wrong direction. BUFR, for all it's faults, is a well-understood international standard.
  2. Just because NCEPLIBS-bufr is old does not mean it should be rewritten from scratch, certainly not in C++. Here's a seminal blog post from the world of Agile programming: https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/. Every NOAA programmer should be made to memorize this blog post. Here's a key point:

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:

It’s harder to read code than to write it.

I urge everyone to read the entire blog post and try to understand it. Although this blog post is 20 years old, it is just as valid today.

  1. We will incrementally move NCEPLIBS-bufr towards the library we want it to be. Bad code can be refactored piece by piece. All existing codes that use NCEPLIBS-bufr will be respected by fully backward-compatible releases.

This is the way many libraries are managed, and it will work for us too. The NetCDF library contains a lot of code from the 1980s, but that's OK because it still works, is well-tested and well-documented, and is structured to work well in the future.

That is what we will achieve with NCEPLIBS-bufr.

edwardhartnett commented 2 years ago

As I reread all the discussion here, it's starting to seem like the C solution is the way forward. If this cannot be done easily in native Fortran, it certainly can be done in C with Fortran wrappers...

aerorahul commented 2 years ago

I think that would be a mistake and a huge waste of time and effort. But, what do I know, right.

edwardhartnett commented 2 years ago

@aerorahul can you please describe which idea you think would be a huge waste of time and effort?

CoryMartin-NOAA commented 2 years ago

@edwardhartnett JEDI IODA 'format' is either netCDF4/HDF5 or "ODB" (developed by ECMWF) with a particular group/file structure. It can be read/written by any HDF5/netCDF4 tool as far as I am aware.

We 1000% need BUFR for the foreseeable future. It is the WMO standard for observation storage/transmission.

Regarding 'backwards-compatibility', I ask, who are the users of NCEPLIBS-bufr? Are there others besides GSI and obsproc_prep? Both of these things are being completely replaced/overhauled in the next couple of years. Backwards compatibility is a good (great?) idea overall, but it introduces limitations and is not always practical. For GFSv16, @edwardhartnett as you may recall, we switched to netCDF I/O for the model and data assimilation and post processing. That required more changes than a new BUFR library would in GFS, so this is not without precedent. @rmclaren has been trying to generalize BUFR read in JEDI and has encountered many limitations in the current library. I learned on Fortran, and only know a little C++, but I see where both have their strengths and disadvantages. JEDI uses both C++ and Fortran 2003, which allows low level routines to be forklifted and unmodified while higher level things can be more modular. Perhaps that is an approach to consider.

edwardhartnett commented 2 years ago

OK, we all agree that BUFR format is here to stay.

The NCEPLIBS group has no mandate or interest in trying to rewrite NCEPLIBS-bufr from scratch, so if anyone wants to attempt that, we wish them luck but cannot offer much programming help. And any such replacement library would have to reach a high state of readiness before we could switch to it, which is not likely to happen soon, or at all. So NCEPLIBS-bufr is also here to stay.

@CoryMartin-NOAA thanks for the clarification of the JEDI IODA format. I would be interested in seeing what exactly it means, since it is using netCDF/HDF5. Is there some documentation of IODA on-line?

I would agree that any application that is using BUFR for an internal format would do well to switch to netCDF/HDF5. It's much easier for everyone else to deal with NetCDF files! But not all uses of NCEPLIBS-bufr are for internal data systems. We must read and write BUFR in at least some places, and that requirement seems firm.

Since NCEPLIBS-bufr is therefore going to be our BUFR solution for the coming decades, then it is worth a great deal of time and attention, and we will incrementally make it everything we need to be, in order to have the world-class software we desire. @aerorahul I certainly don't consider this a waste of time and effort - I consider it our job. ;-) And improving a necessary operational software library for the UFS and other important applications is certainly worthy of our efforts.

Which means we must, to return to the issue, fix the 4/8/d problem. Even if that takes a lot of work. I am open to a fully-Fortran solution if it can be found.

WRT current users and backward compatibility - from my past experience with netCDF and other libraries, I am confident that we can manage our incremental improvements to preserve full backward compatibility. If that has to be violated on some occasions, we can discuss on a case-by-case basis. But generally, I have found that new ideas and code can be accommodated, while still allowing existing users to seamlessly upgrade to new releases.

@CoryMartin-NOAA and @rmclaren we must cope with the limitations you encounter one-by-one. Whatever can be achieved by a complete re-write can also be achieved by incremental improvements.

CoryMartin-NOAA commented 2 years ago

@edwardhartnett see here: https://jointcenterforsatellitedataassimilation-jedi-docs.readthedocs-hosted.com/en/latest/inside/jedi-components/ioda/index.html and https://jointcenterforsatellitedataassimilation-jedi-docs.readthedocs-hosted.com/en/latest/inside/conventions/files_and_components.html

My $0.02 is that we need a library that can be flexible enough to read any BUFR file with minimal duplication of code for different file types and the user need only provide mnemonics and it returns arrays. I don't really care to dictate the solution, but I think we should probably think about the requirements further before any reengineering happens. The biggest advantage to JEDI over GSI (in my opinion) is removing all the redundant, sensor-specific code. There should only be one read routine, not N routines where N=the number of observation types. However we can get to that is the right path forward.

jamiebresch commented 2 years ago

In my opinion, the biggest disadvantage to JEDI over GSI is that users have to carry a huge load of hard-to-manage yaml settings for every single run.

CoryMartin-NOAA commented 2 years ago

@jamiebresch I don't disagree. Individually, YAML files are great. But the 50k line concatenated monster that goes into the executables is less than ideal. For @rmclaren 's converter, the YAML is straightforward, but the UFO QC YAML is often unwieldy. But that's a discussion for a different venue.

aerorahul commented 2 years ago

@jbathegit @jack-woollen What are the currently used applications (that you know of) that leverage the NCEPLibs-bufr library. I am aware of the following:

As @CoryMartin-NOAA says, JEDI is replacing GSI and a significant (if not all, but surely a lot) of Observation Processing. So who is this refactored library going to serve and in what capacity?

Regardless of choice of language, how does the future library for encoding and decoding of BUFR messages look like in this universe? Who are your customers and what are their needs?

jack-woollen commented 2 years ago

@aerorahul I'm not sure what refactored library you're referring to. There's been so many of them being imagined. The current bufrlib seems mostly sufficient to handle foreseeable future needs for BUFR processing within NCEP, especially if the future looks anything like the picture being painted in this ongoing conversation. Incremental improvements to any future version of bufrlib will be necessary even so, to deal with WMO BUFR evolution and associated new BUFR dataset structures, at least. I believe in looking ahead, and stich-ing in time, but I don't believe in fixing things that aren't broken, at least not without a good reason. Optimizing operation and economizing maintenance both count as good reasons. Case in point, consolidating 3 versions into 1 is an example of something worthwhile.

aerorahul commented 2 years ago

@jack-woollen I agree this conversation has derailed from consolidating 3 versions 8|4|d to a restructuring/refactoring of the library and its use in the post GSI/obsproc and into the JEDI world.

ADCollard commented 2 years ago

Hi All,

Cory added me to this thread earlier today.

It seems to me there are a number of answered questions about how we should proceed with NCEPBUFR support and, speaking as an end-user working on the JEDI/IODA conversion, I think it would be helpful to me to understand the big picture better.

Would people be amenable to a meeting sometime in the next couple of weeks to discuss this?

Andrew.

On Tue, Jan 25, 2022 at 2:30 PM Rahul Mahajan @.***> wrote:

@jack-woollen https://github.com/jack-woollen I agree this conversation has derailed from consolidating 3 versions 8|4|d to a restructuring/refactoring of the library and its use in the JEDI world.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/78#issuecomment-1021534716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJTUMJF2JCCFSM5O5PCTHNDUX325LANCNFSM4TYNIUTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

arunchawla-NOAA commented 2 years ago

I agree that a lot of fascinating discussions have been happening in this thread and it will be useful to have a meeting to discuss these. However, to precede that it would be useful to have a few requirements laid out. One requirement is obvious (at least to me) and that is to handle the 4/8/d builds with one source. However, there are probably a lot of requirements from the JEDI side that would be useful to document and then we can develop a roadmap

jbathegit commented 2 years ago

On the main topic of consolidating the 4/8/d builds into one, I've put some more effort into this, and I've figured out a Fortran solution that's based on the example shown above, but which doesn't require pre-processing and instead just keeps the main body encapsulated in the module as a separate, private subroutine. See https://github.com/NOAA-EMC/NCEPLIBS-bufr/blob/onebuild/src/writsa.f90 for more details of what this looks like, and note that this does still require every application program to include a new use <module> statement as noted earlier, since the subroutine name (in this case, writsa) can now only be referenced via a Fortran module. But it does work and seems like a fairly clean solution. And note that if we end up using _8 as the eventual single build of the library, then lines 154-158 and 162-163 can be removed as well.

For C interoperability, my thinking was to expand on the work that @rmclaren did previously with bufr_interface.f90, meaning that for all of the routines that need to be callable from C, we'd define the C<->Fortran interface within this file, then add a use <module> statement as noted above to expose all of the Fortran subprogram names that can now only be referenced via a module. This would have the added benefit of also addressing the long-standing UNDERSCORE issue noted in #20 and #79. I tried creating a prototype by coding up a ufbint.f90 routine similar to the above writsa.f90 model; however, despite numerous attempts I couldn't get the syntax quite right so that bufr_interface.f90 could now call the ufbint reference in this new module. I think I'm close, but I may need some help from @rmclaren or others to resolve this.

In summary, I think this approach is doable, though it will take quite a bit of time to extend to all of the different subprograms in the library, and there's also the question of Python interoperability, if all of the Fortran subprogram names can now only be referenced via a Fortran module going forward. I'll likely need some help with that as well.

jbathegit commented 2 years ago

Some more details on the aforementioned bufr_interface.f90 issue. Here's how it tries to call ufbint via a wrapper:

!> @author Ronald McLaren
!> @date 2020-07-29
!>
!> @brief Wraps BUFRLIB "ufbint" function.
!>
!> @param[in] bufr_unit - c_int: the fortran file unit number to read from
!> @param[inout] c_data - c_ptr: c style pointer to a pre-allocated buffer
!> @param[in] dim_1, dim_2 - c_int: dimensionality of data to read or write
!> @param[out] iret - c_int: return value, length of data read
!> @param[in] table_b_mnemonic - c_char: string of mnemonics
!>
subroutine ufbint_c(bufr_unit, c_data, dim_1, dim_2, iret, table_b_mnemonic) bind(C, name='ufbint_f')
  integer(c_int), value, intent(in) :: bufr_unit
  type(c_ptr), intent(inout) ::  c_data
  integer(c_int), value, intent(in) :: dim_1, dim_2
  integer(c_int), intent(out) :: iret
  character(kind=c_char, len=1), intent(in) :: table_b_mnemonic
  real, pointer :: f_data

  call c_f_pointer(c_data, f_data)
  call ufbint(bufr_unit, f_data, dim_1, dim_2, iret, c_f_string(table_b_mnemonic))
end subroutine ufbint_c

so I initially set up the new ufbint.f90 with an internal ufbint_4_d interface similar to the above writsa.f90 (since c_int is equivalent to a 4-byte integer) and added the corresponding use <module> statement, but it still couldn't access ufbint that way. I then tried adding a new ISO C interface to ufbint.f90 to more directly match up the types for this particular call:

    subroutine ufbint_isoc( lunin, usr, i1, i2, iret, str )
!>      used when call arguments to ufbint are ISO C types

        use iso_c_binding

        implicit none

        integer(c_int), value, intent(in) :: lunin
        integer(c_int), value, intent(in) :: i1, i2
        integer(c_int), intent(out) :: iret

        real, pointer :: usr
        character*(*) str

        integer :: my_lunin, my_i1, my_i2, my_iret

        my_lunin = lunin
        my_i1 = i1
        my_i2 = i2

        call ufbint_body( my_lunin, usr, my_i1, my_i2, my_iret, str )

        iret = my_iret

    end subroutine ufbint_isoc

but that didn't work either and complained of a type mismatch in the dummy arguments. So that's where I'm stuck at the moment, in case anyone sees anything obvious that I'm missing here? ;-)

kgerheiser commented 2 years ago

Is it an issue with calling the generic interface (ufbint)? I have found that calling the specific function manually will give better hint at what the actual mismatch is because the compiler knows what the arguments are supposed to be, other than "it doesn't match one of these preset lists exactly".

rmclaren commented 2 years ago

In your second example, why is str not character(kind=c_char, len=1), intent(in) :: str? You need to be careful because C strings and Fortran strings aren't represented the same way in memory (they aren't compatible). This is why I made the c_f_string function (to convert between them). Also, do you need to do something like use subroutine_ufbint since the interface is defined within a module (both examples)?

jbathegit commented 2 years ago

Thanks for the replies @kgerheiser and @rmclaren. Yes, I had been including use subroutine_ufbint inside of bufr_interface.f90, but now I'm wondering if that may not have really been needed. Either way, I started trying out some of the suggestions, but WCOSS2 access has been down for the past couple of hours so progress is halted for now. Once it's back up I'll run some more tests, including calls to the specific function rather than the interface, and share the results.

jbathegit commented 2 years ago

So here's where I'm at right now...

in bufr_interface.f90:


  use iso_c_binding
  use bufr_procedures

!> @author Ronald McLaren
!> @date 2020-07-29
!>
!> @brief This function turns a c string into a fortran string.
!>
!> @param[in] c_str - c_char: pointer to a \0 (null) terminated c string
!> @param[out] f_str - character(:): fortran string
!>
function c_f_string(c_str) result(f_str)
  character(kind=c_char,len=1), intent(in) :: c_str(*)
  character(len=:), allocatable :: f_str
  integer :: nchars

  nchars = 1
  do while (c_str(nchars) /= c_null_char)
      nchars = nchars + 1
  end do
  nchars = nchars - 1

  allocate(character(len=nchars) :: f_str)
  f_str = transfer(c_str(1:nchars), f_str)
end function c_f_string

!> @author Ronald McLaren
!> @date 2020-07-29
!>
!> @brief Wraps BUFRLIB "ufbint" function.
!>
!> @param[in] bufr_unit - c_int: the fortran file unit number to read from
!> @param[inout] c_data - c_ptr: c style pointer to a pre-allocated buffer
!> @param[in] dim_1, dim_2 - c_int: dimensionality of data to read or write
!> @param[out] iret - c_int: return value, length of data read
!> @param[in] table_b_mnemonic - c_char: string of mnemonics
!>
subroutine ufbint_c(bufr_unit, c_data, dim_1, dim_2, iret, table_b_mnemonic) bind(C, name='ufbint_f')
  integer(c_int), value, intent(in) :: bufr_unit
  type(c_ptr), intent(inout) ::  c_data
  integer(c_int), value, intent(in) :: dim_1, dim_2
  integer(c_int), intent(out) :: iret
  character(kind=c_char, len=1), intent(in) :: table_b_mnemonic
  real(c_double), pointer :: f_data

  call c_f_pointer(c_data, f_data)
  call ufbint_isoc(bufr_unit, f_data, dim_1, dim_2, iret, c_f_string(table_b_mnemonic))
end subroutine ufbint_c

where right now I'm just trying to get it to connect directly with the ufbint_isoc subroutine, never mind trying to go through the generic ufbint interface reference.

in ufbint.f90:

    subroutine ufbint_isoc( lunin, usr, i1, i2, iret, str )
!       used when call arguments to ufbint are ISO C types

        use iso_c_binding

        implicit none

        integer(c_int), value, intent(in) :: lunin
        integer(c_int), value, intent(in) :: i1, i2
        integer(c_int), intent(out) :: iret
        real(c_double), pointer, intent(inout) :: usr
!        real(c_double), intent(inout) :: usr(i1,i2)
!        character(c_char, len=1), intent(in) :: str
        character(len=*), intent(in) :: str

        integer :: my_lunin, my_i1, my_i2, my_iret

        my_lunin = lunin
        my_i1 = i1
        my_i2 = i2

        call ufbint_body( my_lunin, usr, my_i1, my_i2, my_iret, str )

        iret = my_iret

    end subroutine ufbint_isoc

where ufbint_body is the same code as the existing ufbint.f.

But here's the compiler error I'm seeing:

/lfs/h2/emc/obsproc/noscrub/Jeff.Ator/NCEPLIBS-bufr-GitHub/nceplibs-bufr/src/ufbint.f90(242): error #8284: If the actual argument is scalar, the dummy argument shall be scalar unless the actual argument is of type character or is an element of an array that is not assumed shape, pointer, or polymorphic.   [USR]
        call ufbint_body( my_lunin, usr, my_i1, my_i2, my_iret, str )
-------------^
compilation aborted for /lfs/h2/emc/obsproc/noscrub/Jeff.Ator/NCEPLIBS-bufr-GitHub/nceplibs-bufr/src/ufbint.f90 (code 1)
make[2]: *** [src/CMakeFiles/bufr_8_f.dir/build.make:933: src/CMakeFiles/bufr_8_f.dir/ufbint.f90.o] Error 1

and I see this regardless of whether I use the declaration

real(c_double), pointer, intent(inout) :: usr or real(c_double), intent(inout) :: usr(i1,i2)

in ufbint.f90, and regardless of whether I use

character(kind=c_char, len=1), intent(in) :: str or character(len=*), intent(in) :: str

Can anyone spot anything obvious that I'm doing wrong here? I know we'd all like to find a way to get this to work so we can just have one build rather than three separate 4/8/d builds. I feel like I'm really close, but I've tried all kinds of different options and I just can't quite get this to work. Note from above that I was able to get this same sort of thing to work for writsa.f90, so the concept does work, but there's something in this call to ufbint from the bufr_interface.f90 module that's not quite right.

(Thanks! ;-)