Sound static analysis needed to complement CodeQL

ryao commented 1 year ago

CodeQL's ability to do numerous checks and allow people to develop their own checks makes it a wonderful tool. However, I feel that a freely available sound static analysis tool that is comparable to the commercial tools such as Astree and Polyspace Code Prover is needed to complement CodeQL. The current commercial options are:

These tools are based on the theory of abstract interpretation such that a failure of their checks to find issues in software proves the absence of those issues in the software.

As far as I know, the open source options in this area are currently:

https://github.com/NASA-SW-VnV/ikos https://github.com/static-analysis-engineering/CodeHawk-C https://frama-c.com/fc-plugins/eva.html https://github.com/seahorn/crab

As far as I can tell:

IKOS does not support concurrency and has some trouble with small integer widths when used as loop counters. It is described by a PDF presentation. Given that NASA uses it to develop aviation software, I consider it to be the most promising of all of the free sound static analysis options, but the lack of concurrency support limits its usefulness.
CodeHawk has zero documentation describing what it can and cannot do.
Frama-C EVA did very well in a NIST study, but it cannot handle recursion or branching as far as its documentation suggests, which makes it useless for most code.
Crab is just a library that is made under the Theory of Abstract Interpretation and could be the starting point for an analyzer like IKOS. It is used by the Clam analyzer, but that does not claim soundness and I only listed it because it is based on Abstract Interpretation.

Microsoft and Google have concluded that 70% of security issues in C/C++ code are from memory safety issues. Sound static analyzers can give developers a way to eliminate all memory safety issues from existing C/C++ code, but they are prohibitively expensive, such that they are only used in very deep pocketed industries where software failures can be fatal. Such tools can be developed for other languages too, but at present as far as I know, production quality sound static analysis tools only exist for C/C++, which is where the greatest need is.

I know that this is asking for very much, but would GitHub either develop a sound static analysis tool comparable to Astree or acquire one, to make available to the broader community alongside CodeQL? A sound static analysis tool will always be limited in what it can check, while unsound static analysis tools such as CodeQL can check far more things, so the two would complement one another.

This is not the best forum for an issue is requesting a sister tool to CodeQL, but I do not know of a better forum for it and I am certain that posting it here will result in it getting the attention of the right people at github. I am also certain that the CodeQL developers will have their own opinions on this idea, which would be most welcome, even if those opinions are only posted internally to github such that I cannot see them.

aeisenberg commented 1 year ago

Thank you for your detailed comments and questions. CodeQL is a tool that efficiently provides developers with actionable alerts over large and complex code bases. The tool trades off possible false negatives for fewer false positives to help ensure that everything that is reported is actionable.

We currently have no plans to create to create a new static analysis tool that focuses on soundly proving that a program is bug free.

ryao commented 1 year ago

Thank you for your detailed comments and questions. CodeQL is a tool that efficiently provides developers with actionable alerts over large and complex code bases. The tool trades off possible false negatives for fewer false positives to help ensure that everything that is reported is actionable.

We are both in agreement on this.

We currently have no plans to create to create a new static analysis tool that focuses on soundly proving that a program is bug free.

Such a tool could never prove software to be bug free. However, it could prove software to be free of runtime issues such as undefined behavior, data races and memory safety issues.

I already knew that the CodeQL team is stretched too thinly to embark on this effort when I posted this. That was apparent from the impressive yearly pace at which new languages are supported by CodeQL. CodeQL is undoubtably an extremely important tool. I greatly respect GitHub’s management for not only identifying the need for CodeQL, but committing the funds to make it happen through M&A. However, the wider community also needs a sister tool that focuses on soundly proving the absence of memory safety issues, undefined behavior and data races.

The commercial tools sector already recognizes this with math works offering both the poly space bug finder, which is a conventional analyzer CodeQL and the poly space code prover, which proves the absence of runtime issues. Absint offers both Astree and Rulechecker with Rulechecker being a conventional analyzer like CodeQL. The wider open source community has access to CodeQL through GitHub, but lacks access to a sister tool that focuses on proving the absence of memory safety issues, undefined behavior and concurrency issues.

The two approaches are complementary, since neither can the sound approach be extended to handle the wide set of possible patterns that a conventional tool such as CodeQL can nor can the conventional approach used by CodeQL be made to catch all bugs within the impressively wide set of classes that it checks. The community needs access to both types of tools. Access to a sound static analyzer at the level of Astree would enable the open source community to eliminate at least 70% of security issues from C/C++ software, while access CodeQL would not only be extremely important for tackling the remaining ones, but also be important for all other languages.

While the CodeQL team is stretched too thinly to make a sister tool that does sound static analysis, GitHub management could acquire the rights to make Astree available to the open source community as a sister tool to CodeQL. At present, such a sister tool is only accessible to proprietary software developed within very deep pocketed industries such as nuclear and aviation. NASA developed IKOS as an open source alternative, but it does not support concurrency, which makes it useless for many important open source projects. It is my hope that GitHub’s management could change this through M&A, but they need to know the possibility exists before there is any chance that they could do it.

Would you please pass my suggestion to higher level management for their consideration?

ivanperez-keera commented 6 months ago

Hi. I'm the current maintainer of IKOS. I just wanted to chime in.

We have just switched to a quarterly release cycle, and have opened up a public discussion on github for people to help us determine what would be most helpful to the community and should be considered for the next release: https://github.com/NASA-SW-VnV/ikos/discussions/258.

As for the issue with small integers in loops, we recently talked about it and agree that it's a pain. It's not listed in that thread but please feel free to do so. The same goes for the concurrency limitations. Having the community comment on these issues also helps us convince stakeholders of what the project priorities should be.

Also, we'd be open to collaborations or someone helping extend IKOS' capabilities. I don't know if anyone from this issue may be in a position to do so, but perhaps there are other mechanisms to help facilitate it (e.g., students investigating it as part of MSc or PhD theses, Summer of Code, funding someone to contribute a desired feature, etc.).

github / codeql

Sound static analysis needed to complement CodeQL #13092