github / codeql-coding-standards

This repository contains CodeQL queries and libraries which support various Coding Standards.
MIT License
129 stars 60 forks source link

`A2-5-2`: Missing query #154

Open rcseacord opened 1 year ago

rcseacord commented 1 year ago

Affected rules

Description

The checker for "Rule A2-5-2 (required, implementation, automated) Digraphs shall not be used." was not implemented. Presumably, the expectation was that compiler flags would be sufficient. However, this is not the case.

Clang has the following flag:

-fno-digraphs

Disables alternative token representations <:, :>, <%, %>, %:, %:%: (default)

I haven't tested this, but presumably disable means that it stops converting them into the corresponding characters. However, to be compliant with the rule we need to diagnose these sequences of characters, even if they are not translated.

GCC is much worse, as they have no checker at all so there is no way to enforce this rule if you are using this compiler.

lcartey commented 1 year ago

Thanks for reporting this. We didn't implement this rule because:

  1. The CodeQL database doesn't currently store information about the digraphs used.
  2. We believe that any program that uses digraphs will fail to compile with the -fno-digraphs flag specified when using clang (further explanation below).
  3. We currently only officially support clang-derivatives, so the behaviour of gcc was not relevant.

We are, however, looking to expand our compiler support to gcc-like compilers, so we may have to consider what we could do for this rule. It would be a low priority feature request for the CodeQL C++ team to add native support for detecting digraphs, so we may have to look at workarounds, such as a lexical analyzer.


I haven't tested this, but presumably disable means that it stops converting them into the corresponding characters. However, to be compliant with the rule we need to diagnose these sequences of characters, even if they are not translated.

Digraphs behave differently from trigraphs, in that they are alternative tokens rather than straight character replacements. Trigraphs get replaced in the very first phase of translation, by matching character sequences. In comparison, digraphs are only considered when the tokenization of operators and punctuators occurs. Notably, this is after string literals and comments have been processed (see lex.phase) so digraphs can never occur in comments or string literals, and as tokens they can only appear in valid places in the grammar.

My interpretation of this rule is therefore that the sequences specified are only "digraphs" if they appear in place where they would be tokenized as such. With -fno-digraphs specified, the digraphs are no longer tokenized, which, I believe, will always lead to a program with digraphs failing to compile (counter examples welcome).

rcseacord commented 1 month ago

Has any additional consideration been given to implementing this rule?