github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.67k stars 1.54k forks source link

Question: Extending Query (UnsafeDeserialization.ql) for CWE-502 #14004

Open krasnopg opened 1 year ago

krasnopg commented 1 year ago

Hi,

I am analysing python code in terms of vulnerability CWE-502 and am running query UnsafeDeserialization.ql for this purpose. Now I would like to adapt the query to extend to more sources of untrusted data, namely:

  1. I would like to mark local files as untrusted, marking the following example as vulnerable:
    
    import yaml

def unsafe_load(filename): with open(filename) as untrusted: return yaml.load(untrusted)


2. I would like to mark function parameters as untrusted, marking the following example as vulnerable:

import yaml

def unsafe_load(untrusted): return yaml.load(untrusted)



I am new to codeQL and after studying the documentation on how to write codeQL queries in Python and the codeQL repository, I am still not sure how and where I could extend the configuration to add these two sources. Based on [analyzing-data-flow-in-python](https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-python/) it seems that I can use `Concepts::FileSystemAccess` and `DataFlow::ParameterNode` to model the sources and that I need to append them to the `isSource` predicate in the configuration. However, I am not sure what the current sources are based on `semmle.python.security.dataflow.UnsafeDeserializationQuery` in [UnsafeDeserializationQuery.qll](https://github.com/github/codeql/blob/main/python/ql/lib/semmle/python/security/dataflow/UnsafeDeserializationQuery.qll) and if there is any additional modification step that I need to take to run the new query.
Any help or clarifications would be greatly appreciated!
RasmusWL commented 1 year ago

Hi @krasnopg, your proposed steps should work :+1:

If I had to make this change, I would add to the set of sources by extending the Source class defined in UnsafeDeserializationCustomizations.qll, like we do for RemoteFlowSource. You could even put this new class of yours (extending the Source class) in Customizations.qll, which is the best way of modifying behavior we have right now.

Our field team has built some functionality for deploying such modifications. I'm not personally 100% on top of how it works, but you can read more about it here: https://github.com/advanced-security/codeql-bundle-action