schwehr commented 6 years ago

It's pretty easy to overflow the signed integer math in ut_encode_clock(). One way to fix this would be to use double constants as the multiplier. 60 -> 60.0. But it might be better to range check the hours, minutes, and seconds args. The code might issue a warning if things are out of a reasonable time range and either clamp the result or return 0.0.

double
ut_encode_clock(
    int     hours,
    int     minutes,
    double  seconds)
{
    return (hours*60.0 + minutes)*60.0 + seconds;
}

Also, that is the expected range of hours, minutes and seconds? Should it be okay with a leap second? Is this return seconds since midnight and on what day is this relative to? Can hours be greater than 25? Can seconds be 60?

semmerson commented 6 years ago

I can't think of a good solution to this because the function is publicly visible and doesn't return a status value. Look like I painted myself into a corner.

The documentation for this function will state that the absolute values of the arguments must lie within certain ranges.

schwehr commented 6 years ago

I don't think this is resolved. The documentation changes are definitely good, but you still have a case where user input can trigger undefined behavior. A couple solutions:

Use double literals (as I suggested above) to trigger double math
Range check the hours, minutes and seconds. You could emit a warning to stderr, but I'm not sure that is worth it. If any of them is out of range you have a couple of possible options: a. Clamp the values to the side they they are. e.g. negative values clamp to zero; others clamp to 24, 60, or 61 (leap sec) b. Return a negative double to indicate an error.

All of the above are ABI compatible, but do cause behavior changes. However, the current implementation is not stable. The compiler can do what it pleases with values that overflow a signed int. Compilers are sometimes triggering SIGILL these days which immediately brings the house down in the case of an overflow. That avoids (unlikely) chance of security exploits, but is not a good user experience.

semmerson commented 6 years ago

1 isn't backward compatible. The function's been published and we must assume it's being used.

2.a Is a possibility. though I'm not sure that it's better than the documentation changes. "Must" means must -- either way you get something back that's meaningless.

2.b Negative return values from ut_encode_clock() are valid and the clock encoding isn't specified, so choosing a value is problematic.

BTW, the behavior might be unspecified, but it's not undefined (i.e., it won't delete all you files and crash).

schwehr commented 6 years ago

This function most definitely has undefined behahavior and some compilers will raise sigill... Which is allowed by the C spec. They also can do anything else they want :(

Please explain your reasoning for 1 with double literals. What exactly isn't backward compat?

semmerson commented 6 years ago

Changing a published API, in general, isn't backward compatible -- although in this case going from integer to double really isn't much of a change. I'm not sure what it would get you, however, as you could still call the function with values that might cause a SIGFPE.

schwehr commented 6 years ago

I am confused... The return type stays double. There may be some slight varations with the resulting double, but there are no unit tests that break, yes? If there are cases that might break if the results of this function suffer from slight floating point precision issues, there really needs to be at least one test case

schwehr commented 6 years ago

Ah... when you commit changes that relate to an issue, can you please reference the issue in the commit message? Then it will automatically appear in the issue's timeline. e.g. https://github.com/Unidata/UDUNITS-2/commit/42ae9462a6503d23beccf2f966dc1a9eb98e8ada relates to this issue

semmerson commented 6 years ago

@schwehr Even if the hour argument were a double, one could still shoot oneself in the foot by passing-in a value that was greater than MAXFLOAT/60.

As far as test-cases go, in an ideal world, with unlimited time and budget, I'd agree. What we have here, however, is an oversight on my part that's unlikely to be a real problem but does get dinged by an automated test system.

I can live with that.

schwehr commented 6 years ago

Ok. I can't live with it. I will have to put a custom hack into my system or the fuzzer will fall all over this. Users can and do provide all sorts of crazy input for a variety of reasons. It gets super hard to track down when it happens in large scale production systems. udunits is getting used in satellite processing code and I can't predict how things will go sideways.

The seconds being a double does leave room for a floating point overflow and I hadn't noticed that. For completeness, it's when seconds approaches DBL_MAX that would be an issue if it used double literals. https://en.cppreference.com/w/c/types/limits

Fuzzers are great at providing test input that cover cases like this. Once I get udunits setup, I will get bug reports with "proof of concept" (poc) files that contain the input that triggered undefined behavior.

semmerson commented 6 years ago

I use the POSIX standard myself.

One possibility would be for ut_encode_clock() to call ut_set_status(UT_BAD_ARG) and return, say, zero if the arguments were outside their allowed range. This would require the caller to call ut_get_status() after calling ut_encode_clock(), which is similar to the use of strtol().

I'll put that in.

Unidata / UDUNITS-2

ut_encode_clock overflow when fuzzing #70

1 isn't backward compatible. The function's been published and we must assume it's being used.

2.a Is a possibility. though I'm not sure that it's better than the documentation changes. "Must" means must -- either way you get something back that's meaningless.

2.b Negative return values from ut_encode_clock() are valid and the clock encoding isn't specified, so choosing a value is problematic.