david-a-wheeler / flawfinder

a static analysis tool for finding vulnerabilities in C/C++ source code
GNU General Public License v2.0
478 stars 82 forks source link

flawfinder mischaracterizes printf -> vprintf style #28

Open zyga opened 3 years ago

zyga commented 3 years ago

I've tried flawfinder on my zt library:

zt.c:47:  [4] (format) vfprintf:
  If format strings can be influenced by an attacker, they can be exploited
  (CWE-134). Use a constant for the format specification.

This refers to https://github.com/zyga/libzt/blob/main/zt.c#L47 - reproduced below for simplicity:

static void zt_logv(FILE* stream, zt_location loc, const char* fmt, va_list ap)
{
    if (stream != NULL) {
        if (loc.fname != NULL && loc.lineno != 0) {
            fprintf(stream, "%s:%d: ", loc.fname, loc.lineno);
        }
        vfprintf(stream, fmt, ap); /* This is line 47 */
        fprintf(stream, "\n");
    }
}

This function is used inside zt_logf reproduced below:

static void zt_logf(FILE* stream, zt_location loc, const char* fmt, ...)
{
    va_list ap;
    va_start(ap, fmt);
    zt_logv(stream, loc, fmt, ap);
    va_end(ap);
}

This is a common pattern in C, where a printf like function calls into vprintf like function.

It is technically true that vfprintf(stream, fmt, ap) requires agreement between fmt and ap but this is unavoidable in this specific case.

david-a-wheeler commented 3 years ago

Flawfinder is a lexing-only tool, so I don't see how we can handle this case.

To do more requires actually reading the code into some sort of data structure, which is obviously possible but not how flawfinder works.

zyga commented 3 years ago

Perhaps such suggestion should not be emitted? If the diagnostic message is associated with vfprintf will it ever be correct to pass ap but take literal string as fmt? I think the suggestion is over-applied from printf or fprintf but no longer applies in vfprintf.

david-a-wheeler commented 3 years ago

Oh, it definitely applies. If the fmt is from an attacker, the attacker could use %n to write to arbitrary memory, or reveal data that's not supposed to be revealed. If that's less likely we could reduce its risk level to say 3 instead of 4. Unfortunately, a lexically-based tool like flawfinder has no way to determine if the fmt is controlled by an attacker or not; that would require interprocedural flow control, and even then it's often impossible to tell (you also need to know which inputs are trusted, which is typically not information you can derive just from the source code).