Problem with handling detections of sub-rules

n1ckl0sk0rtge commented 1 month ago

Context

When writing a rule using the DetectionRuleBuilder, one can use the shouldBeDetectedAs function to resolve directly a specific parameter, and one can use the addDependingDetectionRules function to resolve a parameter with another rule (for example when this method is a method invocation, using a depending detection rule allows to resolve a specific parameter of this function call). This issue is about the behaviour of the static analysis when resolving the following case (written here in Python, but the issue is also valid for Java):

result = func1(func2("RSA")) # We want to resolve "RSA"

To do so, one should write a rule R1 to detect func1, with a depending detection rule R2 on its parameter to detect func2. Then R2 resolves the content of the parameter of func2, here "RSA", using shouldBeDetectedAs. Currently (at main 0cd69ef79984b2c91bd0e06b07fb5488a2ea6473 and feature/python-support f5cdfe445fd4e5ea87ba6d0e886b84f02ad0c7cd), this type of resolution implies a complex mechanism that is quite limited.

Current mechanism

The detection process starts by visiting all method invocations (and class instantiations). For each invocation, a detection executive is started for each registered "entry" rule: R1 in our case. We will therefore look for all functions func1 in the current scope. Once we have a match, the function analyseExpression is called, and there are several cases:

If the rule is only interested in the function without its parameter, it is handled in a special way that is not relevant for this issue.
If the rule has a DetectableParameter (coming from shouldBeDetectedAs), meaning that we want to resolve a parameter of the detected function call, we name the subsequent handling logic case of a detectable parameter.
If the rule has no DetectableParameter despite having parameters, it is probably a rule written for an intermediary function that does directly not carry information, but may have depending detections rules. We name the subsequent handling logic case of an intermediary parameter.

In our example, func1 brings not information other than through the content of its parameter func2("RSA"). R1 is written as a rule with a depending detection rule R2 on its parameter, but without a shouldBeDetectedAs part. We therefore enter the case of an intermediary parameter in the function analyseExpression.

Case of an intermediary parameter

In this case, depending on the implementation, we have the choice to call onDetectedDependingParameter with two possible scopes: EXPRESSION or ENCLOSED_METHOD. Basically, it will resolve the depending detection rule(s) by visiting either the parameter tree (func2("RSA") in our case) or the enclosing method of this parameter (the function in which this expression is).

In our example, we want to resolve this particular func2("RSA") call, so using the EXPRESSION scope makes sense. func2("RSA") will be the only match of rule R2, which will correctly resolve the parameter content "RSA". This resolution is will go through the case of a detectable parameter explained below. This example worked as expected!

Problem 1: This example worked because func2("RSA") was a function call. Here is a second example with an intermediary variable var:
var = func2("RSA")
result = func1(var) # We want to resolve "RSA"
In this case, because var is not a function call, visiting its tree will not return any results. Finding the function call associated to var is not that trivial, because there may be several intermediary assignments, results of function calls, dictionary values, etc.

Case of a detectable parameter

Let's look at a new example:

var = ec.ECDSA("SHA_256")
other_var = ec.ECDSA("SHA_512") # Some other var we are not interested in
result = func1(var) # We want to resolve "ECDSA" and "SHA_256"

In this example, func1 still does not bring information other than through the content of its parameter var = ec.ECDSA("SHA_256"). R1' (to detect func1) is written as a rule with a depending detection rule R2' for its parameter, but has now also a shouldBeDetectedAs part aiming at resolving the value "ECDSA" (name of the function used as parameter of func1). R2' (to detect ec.ECDSA) (with shouldBeDetectedAs), resolves the content of the parameter of ec.ECDSA, here "SHA_256", using shouldBeDetectedAs. Upon detection of func1 with rule R1', because the parameter of func1 is a detectable parameter, we enter the case of a detectable parameter.

Let's say that resolution works as expected in this case ("ECDSA" has been detected). Then, analyseExpression will call onReceivingNewDetection, that will call the following rules (here R2') with followNextRules. For now, there is no choice of scope, so it is necessarily the scope of the enclosing method. This means that it will look for a call of ECDSA in the entire scope of the method: how to make sure that this call is linked to our parameter var of func1? To do so, the current implementation uses a TraceSymbol. onReceivingNewDetection will obtain the symbol of the parameter of func1, here SYMBOL, var, and will pass it as an argument to followNextRules.

Then, all function calls in the scope of the enclosing method will be matched against R2', which will detect the two ECDSA function calls, one assigned to var and the other to other_var. Using the previous TraceSymbol, we filter the findings to only keep the finding linked to var. In our case, this works and this example will correctly resolve "SHA_256".

Problem 2: This example worked because ec.ECDSA("SHA_256") was directly assigned to var. Here is a second example intermediary variable intermediary_var:
var = ec.ECDSA("SHA_256")
intermediary_var = var
other_var = ec.ECDSA("SHA_512") # Some other var we are not interested in
result = func1(intermediary_var) # We want to resolve "ECDSA" and "SHA_256"
In this case, SYMBOL, intermediary_var is the TraceSymbol obtained at the end of R1' and passed to followNextRules. But the symbol to which ec.ECDSA("SHA_256") is assigned is var. Because those two symbols don't match, there will be no detection. Again, correctly linking intermediary_var to ec.ECDSA("SHA_256") is not that trivial, because there may be several intermediary assignments, results of function calls, dictionary values, etc.

Problem 3: What if there is no intermediary variable at all:
other_var = ec.ECDSA("SHA_512") # Some other var we are not interested in
other_func(ec.ECDSA("SHA_512")) # Some other function call we are not interested in
result = func1(ec.ECDSA("SHA_256")) # We want to resolve "ECDSA" and "SHA_256"
In this case, NO_SYMBOL is the TraceSymbol obtained at the end of R1' and passed to followNextRules, because the parameter of func1 is not a variable. Then, all function calls in the scope of the enclosing method will be matched against R2', which will detect the three ECDSA function calls, one assigned to other_var and the two others assigned to nothing (TraceSymbol NO_SYMBOL). The filtering step can then remove the ECDSA linked to the TraceSymbol SYMBOL, other_var, but it cannot distinguish between the two NO_SYMBOL. It will therefore resolve both SHA_512 and SHA_256. To avoid this case, we need to ensure that only the parameter of func1 gets resolved. This could be done by limiting the scope to this parameter expression, as we can do in the case of an intermediary parameter (but we then need to solve Problem 1).

A better approach

Could a better and simpler approach be found and replace this complex mechanism (currently based on limited TraceSymbols and varying scopes of resolution)?

TODO: think about it.

n1ckl0sk0rtge commented 1 month ago

Another problem related to this mechanism with TraceSymbols:

Problem 4: Depending detection rules of method detection rules:
parameters = dh.generate_parameters(generator=2, key_size=2048)
server_private_key = parameters.generate_private_key() # Noncompliant {{GENERATION}}
Here, the rule entry point is generate_private_key. This rule is a method detection rule (as there are no parameters), and it has a depending detection rule to look for generate_parameters in the method scope. Indeed, if generate_parameters is found, we can obtain more information (like the key size). After the detection of generate_private_key, the engine will initiate calling the depending detection rule in onReceivingNewDetection, in which the engine will use the TraceSymbol NO_SYMBOL for the depending detection rule. Indeed, as the depending detection rule is defined on a method, there is no relevant symbol. However, while the depending rule will find generate_parameters, this detection has a TraceSymbol SYMBOL, parameters. The TraceSymbol filtering will therefore remove this finding as it was expecting a finding with NO_SYMBOL.

This behaviour seems to be a bug more than a limitation of the TraceSymbol mechanism. If a depending detection rule is added to the method of a rule (and not a parameter), the TraceSymbol filtering should be completely bypassed. It can be done from now on using SYMBOL_IGNORED instead of NO_SYMBOL in this case, even though this is not what SYMBOL_IGNORED was intended for.

In the future, it may be even better to have a way to link these two function calls using parameters, which is the invoked object of the parent rule and the result variable of the child rule. But this may be hard to implement and use in practice.

n1ckl0sk0rtge commented 1 month ago

Another problem where TraceSymbols work, but where we are too limited by the scope of search:

Problem 5: class variables (with self) [example inspired by this code]:
class SHA256WithRSA:
  def __init__(self):
          self.private_rsa = rsa.generate_private_key(public_exponent=65537, key_size=2048)
  def sign(self, data: bytes) -> bytes:
      signature = self.private_rsa.sign(
          data=data,
          padding=padding.PKCS1v15(),
          algorithm=hashes.SHA256()
      )
      return signature
When identifying rsa.generate_private_key and looking for the depending sign detection rule, we define the TraceSymbol SYMBOL, private_rsa. This is then indeed the symbol on which will be called the sign function from the library. However, the scope search for the sign function will be limited to the __init__ function, so sign will not be detected. The same example/problem should exist in Java with class attributes, how is it handled?

IBM / sonar-cryptography