NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
44 stars 19 forks source link

Feature/expose readlc c #519

Closed rmclaren closed 1 year ago

rmclaren commented 1 year ago

Exposes the readlc method in the C interface. The function is used to retrieve long strings from the BUFR file (strings longer than an octet).

Will need new tag after the merge in order to support ioda-converters unfortunately.

rmclaren commented 1 year ago

fixed

jbathegit commented 1 year ago

Thanks for that fix, but now another error has popped up in the develop CI test. In the prototype, output_str should be declared as char (*) [120] (i.e. a pointer to a character string of length 120) rather than just as char *.

/home/runner/work/NCEPLIBS-bufr/NCEPLIBS-bufr/bufr/test/test_c_interface.c:138:48: error: passing argument 3 of ‘readlc_f’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  138 |             readlc_f(BUFR_FILE_UNIT, mnemonic, &long_str, &long_str_len_ptr);
      |                                                ^~~~~~~~~
      |                                                |
      |                                                char (*)[120]
In file included from /home/runner/work/NCEPLIBS-bufr/NCEPLIBS-bufr/bufr/test/test_c_interface.c:12:
/home/runner/work/NCEPLIBS-bufr/NCEPLIBS-bufr/bufr/src/bufr_interface.h:387:54: note: expected ‘char *’ but argument is of type ‘char (*)[120]’
  387 |   void readlc_f(int lunit, const char* str_id, char* output_str, int* output_str_len);
      |                                                ~~~~~~^~~~~~~~~~

Another possibility (I say possibility b/c I'm not sure if it will work?) may be to change the declaration of character(kind=c_char), intent(out) :: output_str(*) in readlc_c to be character(kind=c_char, len=1), intent(out) :: output_str(*) and see if that changes anything(?) Either way, I'm not sure hard-coding a limit of 120 for output_str_f inside of readlc_c is a good idea, because readlc itself could conceivably be used to read strings longer than that from a BUFR message.

But regardless of whatever you set that limit as (whether 120 or whatever), a much bigger concern is that it doesn't look like you have any sort of check to confirm that the argument string output_str is at least that long. So the way I'm reading it, if a user happens to pass in an allocated output_str that's shorter than whatever length you declare for output_str_f, then you could end up segfaulting with an overflow inside of the subsequent call to copy_f_c_str. Or am I missing somethere here?

FYI, and FWIW, the developer CI test is always going to catch things that the other CI tests don't, b/c it has a number of extra sanity checks and other warning flags enabled. So it's not unusual for that one test to fail even if the others pass, and whenever that happens it's important to comb through the logs to see what's really going on.

rmclaren commented 1 year ago

Fixing the test file resolved the issue.

jbathegit commented 1 year ago

Good catch on the faulty pointer notation in test_c_interface.c! But I think my other concern still needs to be addressed...

Specifically, if some other C application code tries to call your new readlc_f and passes in an output_str that's only sized to hold, say, 40 characters, then your internal call to copy_f_c_str is always going to try to write 120 characters into that string and will immediately segfault, correct?

I think to be safe, you should probably pass in output_str_len as an input argument rather than define it as an output argument, and then you could change the 3rd argument to copy_f_c_str to be min( len(output_str_f), output_str_len), similar to the existing logic in nemdefs_f, nemtbb_f, etc. In fact, I'm not sure why output_str_len would ever need to be an output argument in the first place, since copy_f_c_str already ensures that the string is null-terminated, and thus you could always just do a strlen() check in your C application code to check or verify the length.

rmclaren commented 1 year ago

@jbathegit Ok, the long str length parameter is now passed as an input, and the output string length is the minimum of the trimmed string returned from readlc (plus null terminator) or the length of the c buffer that the function was called with.