github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.73k stars 1.55k forks source link

Explanation of ”Comparison result is always the same“ in PR is technically correct, but unclear #11744

Open ryao opened 1 year ago

ryao commented 1 year ago

https://github.com/libarchive/libarchive/pull/1818

Here is a succinct version of the code involved:

    size_t       min_frame_size;
    size_t       max_frame_size;
...
        if (min_frame_size < 0 ||
            min_frame_size >= (intmax_t)SIZE_MAX) {
...
}
...
        if (max_frame_size < 1024 ||
            max_frame_size >= (intmax_t)SIZE_MAX) {
...

In a PR that a colleague had opened, CodeQL correctly flagged both min_frame_size >= (intmax_t)SIZE_MAX and max_frame_size >= (intmax_t)SIZE_MAX and gave technically correct descriptions of the problem.

However, the colleague who saw this thought it had flagged a bug, but was wrong in explaining why. More specifically, when he told a number of us about this, we initially thought that CodeQL just got lucky by having a false positive on something that needed correction, but not for the reason CodeQL originally found it.

The explanations offered in the PR were:

I went back and forth from initially believing him, to thinking that CodeQL was right, to thinking I was wrong to say that CodeQL was right when my initial explanation that CodeQL was right was found to be wrong to confirming that CodeQL was right. Honestly, I felt that I had made a spectacle of myself midway through this process when my initial explanation that CodeQL was right turned out to be incorrect.

The reason why CodeQL is right is that on a 32-bit system, SIZE_MAX is 4294967295U. Cast this to a signed integer, and you are comparing against -1.

On a 32-bit system, his code was doing min_frame_size >= -1 and max_frame_size >= -1, which are always true. The same is true for 64-bit systems, although with the higher magnitude value 2^64 - 1 becoming -1.

When the first condition is false, then we know that min_frame_size >= 0 in the first one or max_frame_size >= 1024 in the second, so CodeQL was correct to point this out. However, without a more detailed explanation, it is easy for programmers to misunderstand the report.

I had already been aware of this since I read about this exact bug from something published by pvs-studio.com a month ago, but I could not remember it at first and after remembering that I had previously seen something on that site that resembled this, I could not find it when I looked. Had I actually found that explanation, it would have saved me some minor grief from thinking that I had made a spectacle of myself by digging deeper into it.

I suspect that there is a better explanation provided in the code scanning reports that only those with write access to the repository can see, but even if that is a better explanation, it is not helpful here because the contributor who opened the PR that CodeQL flagged cannot see that. This might be less of a problem if #11021 were fixed.

ryao commented 1 year ago

It would have been better if the explanations were:

jketema commented 1 year ago

Hi @ryao. Thanks for your observations. I agree with you that the alert message can be much improved.

I suspect that there is a better explanation provided in the code scanning reports that only those with write access to the repository can see

This is not the case. For every query we just return a location and a message (and you already found those) and for some queries this is enriched with one or more paths that tell you were data is coming from. The latter does not apply to the query at hand and, in fact, the explanations you mention are all there is.

I filed an internal ticket for improving the query explanation.

ryao commented 1 year ago

@jketema Thanks for the prompt reply and for letting me know about the code scanning report saying nothing else. That saved me from forking to check out of curiosity.

That said, should this be marked with the acknowledged and C++ labels?