Python SQL Injection not being detected for CWE-089

leviaurizon commented 2 months ago

In one of our projects we have identified a python SQL Injection Vulnerability for CWE-089 which doesn't appear to be being identified by the python SqlInjection.ql found here:

https://github.com/github/codeql/tree/main/python/ql/src/Security/CWE-089/SqlInjection.ql

Here is an example of the code which has vulnerabilities. Note that running bandit -r -t B608 [script name] does seem to work to identify the vulnerability, however, Code QL does not detect it.

def contract_sql_query( contract_name: str, col: list = ["*"], cancel: bool = False, risk_cat: str = "Aurizon", current_month: bool = True, ): if cancel: table_name = "[interface.batch.yield].[YieldVisCancellation]" else: table_name = "[interface.batch.yield].[YieldVisAddition]" current_month = int(current_month)

sql_string = (
    "SELECT "
    + ", ".join(col)
    + f" FROM {table_name} WHERE ContractName = '{contract_name}'  AND MonthCorrect = {current_month}"
)
sql_string += f" AND  {risk_cat} = 1" if cancel else ""
sql_string += f" AND TimeStored = (SELECT MAX(TimeStored) FROM  {table_name} WHERE ContractName = '{contract_name}')"
# print(sql_string)
return sql_string

leviaurizon commented 2 months ago

Also, here are the results from the bandit scan for reference:

`bandit -r -t B608 src/router/additions.py [main] INFO profile include tests: None [main] INFO profile exclude tests: None [main] INFO cli include tests: B608 [main] INFO cli exclude tests: None [main] INFO running on Python 3.12.3 Run started:2024-04-29 05:08:23.372323

Test results:

Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction. Severity: Medium Confidence: Low CWE: CWE-89 (https://cwe.mitre.org/data/definitions/89.html) More Info: https://bandit.readthedocs.io/en/1.7.8/plugins/b608_hardcoded_sql_expressions.html Location: .\src/router/additions.py:146:22 145 146 sql_query = f"SELECT {', '.join(coladd)} FROM [interface.batch.yield].[YieldVisAddition] WHERE ContractName = '{contract_name}'" 147 if len(corridor_names) > 0:

Code scanned: Total lines of code: 143 Total lines skipped (#nosec): 0

Run metrics: Total issues (by severity): Undefined: 0 Low: 0 Medium: 1 High: 0 Total issues (by confidence): Undefined: 0 Low: 1 Medium: 0 High: 0 Files skipped (0):`

tausbn commented 2 months ago

Thank you for your report. I think we need more information in order to determine why CodeQL isn't flagging this.

The code you've supplied just creates and returns a string, and so to the eyes of the CodeQL Python analysis, that code in isolation is completely safe. If this string is subsequently used in a way so that it is executed as a SQL query, only then does it become a potential SQL injection.

In other words, the CodeQL Python analysis does not consider the construction of a string that looks like a SQL injection to be a vulnerability. That string has to flow to a place where it gets executed in order to be counted as a vulnerability.

Moreover, there also has to be some user-controlled piece of data that gets interpolated into the string, otherwise it still isn't considered a vulnerability. (For instance, if you're just using the above function to create SQL queries with strings that are hard-coded in the program, then that's also considered safe.)

Thus, a SQL injection in the eyes of the CodeQL Python analysis requires a data-flow path that

Starts in some user-controlled piece of data (an example of this would be a HTTP request), and
Ends in a place where we either construct or execute a SQL query (the assumption being that if we've called a library function to construct a query, then we're likely going to execute it as well).

There are a variety of reasons why we may not be detecting the vulnerability:

We may not have a model for the relevant kind of user-controlled data.
We may not have a model for the particular SQL library that is used.
The string may flow through some other function in a way that we are not able to model properly (and so the flow simply gets stuck at that point).

We would need to know more about how your code works in order to determine why the injection is not found.

leviaurizon commented 2 months ago

Thank you for your report. I think we need more information in order to determine why CodeQL isn't flagging this.

The code you've supplied just creates and returns a string, and so to the eyes of the CodeQL Python analysis, that code in isolation is completely safe. If this string is subsequently used in a way so that it is executed as a SQL query, only then does it become a potential SQL injection.

In other words, the CodeQL Python analysis does not consider the construction of a string that looks like a SQL injection to be a vulnerability. That string has to flow to a place where it gets executed in order to be counted as a vulnerability.

Moreover, there also has to be some user-controlled piece of data that gets interpolated into the string, otherwise it still isn't considered a vulnerability. (For instance, if you're just using the above function to create SQL queries with strings that are hard-coded in the program, then that's also considered safe.)

Thus, a SQL injection in the eyes of the CodeQL Python analysis requires a data-flow path that

Starts in some user-controlled piece of data (an example of this would be a HTTP request), and

Ends in a place where we either construct or execute a SQL query (the assumption being that if we've called a library function to construct a query, then we're likely going to execute it as well).

There are a variety of reasons why we may not be detecting the vulnerability:

We may not have a model for the relevant kind of user-controlled data.

We may not have a model for the particular SQL library that is used.

The string may flow through some other function in a way that we are not able to model properly (and so the flow simply gets stuck at that point).

We would need to know more about how your code works in order to determine why the injection is not found.

Thanks, this is an example of code that was detected as a vulnerability by bandit, but not CodeQL, where coladd is a string array, and contract_name is a string

sql_query = f"SELECT {', '.join(coladd)} FROM [interface.batch.yield].[YieldVisAddition] WHERE ContractName = '{contract_name}'" if len(corridor_names) > 0: sql_query += " AND Corridor IN ('" + "', '".join(corridor_names) + "')" df_add_hauls = pd.DataFrame.from_records(cursor.execute(sql_query).fetchall())

tausbn commented 2 months ago

Thank you for the additional information. As far as I can tell, we don't currently have any modelling of Pandas DataFrames, so that's at least one reason why we wouldn't find this result.

My guess is that bandit has a much more heuristic approach, and will flag things that simply look like they might be SQL injections (so, more results but probably also more false positives).

github / codeql

Python SQL Injection not being detected for CWE-089 #16353