github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.6k stars 1.52k forks source link

[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753

Open gravingPro opened 3 days ago

gravingPro commented 3 days ago

I've encountered issues in CodeQL regarding data flow interruption. Here are the details:

1. Function Parameter Passing Interruption

In the code below:

def read_sql(sql):
    spark.sql()  # sink custom

def process(func, args): 
    func(*args) 

sql = request.json['data']  # Source
process(func=read_sql, args=sql)

CodeQL fails to detect that the tainted variable sql is passed into read_sql when using the process function to handle the function call and its argument. This shows an interruption in data flow tracking during function parameter passing and subsequent invocation with variable arguments.

2. *args and **kwargs Interruption

The problem with *args (variable positional arguments) and **kwargs (variable keyword arguments) is that when used in a way that impacts data flow, CodeQL can't track accurately. In the given example, using *args in the process function leads to incorrect recognition of the data flow for sql. This issue extends to similar scenarios involving these constructs.

Moreover, these problems also occur in functions related to multithreading and multiprocessing like threading.Thread, mulitprocess.Process, concurrent.futures.ThreadPoolExecutor, and concurrent.futures.ProcessPoolExecutor.

I hope this description helps in identifying and resolving these problems. Looking forward to a timely fix or further guidance on handling such complex data flow tracking scenarios.

Best regards

rvermeulen commented 3 days ago

Hi @gravingPro,

Thanks for the bug report. We will inform the Python team and get back to you on possible further guidance.

rvermeulen commented 3 days ago

Hi @gravingPro,

A quick follow-up question. How is your custom sink defined? In read_sql the argument sql is currently unused.

gravingPro commented 2 days ago

Hi @gravingPro,

A quick follow-up question. How is your custom sink defined? In read_sql the argument sql is currently unused.

It's just a simple example. Any sink can be used here, no matter sql injection or ssrf.