Unidata / UDUNITS-2

API and utility for arithmetic manipulation of units of physical quantities
http://www.unidata.ucar.edu/software/udunits
Other
59 stars 36 forks source link

Issues with lexical elements embed in symbols/names #107

Open Enchufa2 opened 2 years ago

Enchufa2 commented 2 years ago

Context: the units package adds support for measurement units to R, and uses udunits2 as the backend. One of our users was trying to define g%_hemoglobin as 1 g_hemoglobin/dL (g_hemoglobin was first installed, see r-quantities/units#289). We found that such a unit definition can be installed successfully, but parsing doesn't work, which effectively makes the unit unusable. Here's an example in C:

#include <stdio.h>
#include <udunits2/udunits2.h>

int main() {
    ut_set_error_message_handler((ut_error_message_handler) ut_ignore);
    ut_system *sys = ut_read_xml(NULL);
    ut_encoding enc = UT_UTF8;
    ut_set_error_message_handler((ut_error_message_handler) vprintf);

    // install g_hemoglobin
    ut_unit *g_hemoglobin = ut_new_base_unit(sys);
    ut_map_symbol_to_unit("g_hemoglobin", enc, g_hemoglobin);
    ut_map_unit_to_symbol(g_hemoglobin, "g_hemoglobin", enc);

    // install g%_hemoglobin
    ut_unit *gpc_hemoglobin = ut_parse(sys, "1 g_hemoglobin/dL", enc);
    ut_map_symbol_to_unit("g%_hemoglobin", enc, gpc_hemoglobin);
    ut_map_unit_to_symbol(gpc_hemoglobin, "g%_hemoglobin", enc);

    // conversion works
    char *from = "g_hemoglobin/dL", to[128];
    ut_unit *from_u = ut_parse(sys, from, enc);
    cv_converter *cv = ut_get_converter(from_u, gpc_hemoglobin);
    ut_format(gpc_hemoglobin, to, sizeof(to), enc);
    double x = 10;
    printf("From: %f %s\n", x, from);
    printf("To  : %f %s\n", cv_convert_double(cv, x), to);

    // parsing doesn't work
    ut_unit *u = ut_parse(sys, to, enc);
    if (!u) printf("NULL unit!\n");

    cv_free(cv);
    ut_free(from_u);
    ut_free(g_hemoglobin);
    ut_free(gpc_hemoglobin);
    ut_free_system(sys);
    return 0;
}

Save this as test.c and then

$ gcc test.c -l udunits2 && ./a.out
From: 10.000000 g_hemoglobin/dL
To  : 10.000000 g%_hemoglobin
NULL unit!

It could be argued that % shouldn't be allowed in a unit symbol/name, but then unit installation should fail. If it is allowed, then parsing should work.

More info:

$ cat /etc/redhat-release 
Fedora release 34 (Thirty Four)
$ rpm -q udunits2
udunits2-2.2.28-3.fc34.x86_64
$ gcc --version
gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)