github / codeql-cli-binaries

Binaries for the CodeQL CLI
Other
725 stars 101 forks source link

CodeQL: Make any() expression and exists() formula variable identifier optional #53

Open Marcono1234 opened 3 years ago

Marcono1234 commented 3 years ago

What do you think about making the variable identifiers of the CodeQL any(...) expression and exists(...) formula optional if they do not have any formulas? Currently the CodeQL language specification requires an identifier even though it is not used.

Examples:

exists(GadgetClass unused) // Check whether a vulnerable "gadget" class exists on the class path
and any(CustomMethodCall unused).getArgument(0) instanceof CustomArgument

Here in both cases it is currently necessary to specify a variable identifier (unused), even though it is not used.

For the exists formula this could lead to some ambiguity because it currently allows using expressions (e.g. exists(call.getAnArgument())), however because type names as part of variable declarations cannot contain a period, this should be unambiguous.

hmakholm commented 3 years ago

Thanks for the suggestion.

It would be problemativ to support exists(TypeName), because QL doesn't distinguish lexically between type names and variable names, and exists(identifier) will already be parsed as a case of exists(expression), with the expression simply being a variable. Admittedly, exists(var) would be a pretty useless thing to actually write, since there is no way for it to be meaningfully false. However, making the QL parser treat that special case differently would (a) harm orthogonality of the language, and (b) require awkward parser hacks of the kind that tend to compound exponentially. There's a limited amount of such hacks we can have before the whole thing becomes unmaintainable, and we prefer to save them for things with a greater payoff than this.

Based on your suggestion we're looking into supporting exists(TypeName _) where at least you don't need to invent an explicit name. That looks like it would be more straightforward. We'll keep it on the radar.

Marcono1234 commented 3 years ago

Thanks for your response! I somehow missed it until I randomly stumbled upon on it now.

Actually the exists(...) use case described in the description of this issue is flawed because it won't work if the gadget class in question is on the class path but is not used in the code (see https://github.com/github/codeql/issues/4937#issuecomment-759415046), and it is probably also a rare use case. So I would not mind if exists is not changed.

However, I have come across a few cases already where any(...) without variable indentifier would be quite useful. I think for it there should not be any ambiguity. Though I can understand if this is not a high priority for you.

hmakholm commented 3 years ago

In fact there is an ambiguity for any too: One can currently write any(<expression>), because any is parsed by the same mechanism that parses aggregates, and abbreviated forms such as count(<expression>) are definitely useful. Semantically, any(<expression>) is exactly equivalent to the <expression> itself, so there's probably not any existing QL code out there which uses this construction and would break if we forbade it ... but it still feels a bit of a risk.

Marcono1234 commented 3 years ago

Oh, you are right. Though it looks like the language specification does not mention that:

any ::= "any" "(" var_decls ("|" (formula)? ("|" expr)?)? ")"