github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.55k stars 1.51k forks source link

Does codeql support checking the contents of configuration files in yaml format? #16755

Closed Exloit closed 2 months ago

Exloit commented 3 months ago

Hello, I have some golang applications that use yaml format files as configuration files, but R&D often writes some accounts and passwords in the configuration files. How can I use codeql to automatically detect whether the contents of these files contain sensitive files?

  1. When the "codeql database create --language=go" command cannot retrieve the yml file
  2. I created 2 databases by "codeql database create --language=go,yaml ...", but how do I write queries for the yaml database?

Are there some open source queries that can be referenced?

smowton commented 3 months ago

The yaml extractor is unusual in that the fragment of the database schema ("dbscheme") is replicated in the database schema for Ruby, Javascript and Python, meaning that YAML extractor can either populate a plain yaml database, or contribute to a Ruby, JS or Python database. It also means one way to extract YAML and easily use one of those languages' libraries to deal with the YAML database content is to create a one-line JS, Python or Ruby file and extract that language. There's no reason this couldn't also be done with Go, except that we haven't happened to have had that need yet.

That means the JS, Python and Ruby languages are also the places to look for examples of CodeQL that uses yaml data.

The basic database schema for YAML can be seen in the JS dbscheme (for example), starting at https://github.com/github/codeql/blob/main/javascript/ql/lib/semmlecode.javascript.dbscheme#L1057

Then there's a shared CodeQL module that defines YAML classes and predicates on top of the database schema: https://github.com/github/codeql/blob/main/shared/yaml/codeql/yaml/Yaml.qll -- for example, it defines YamlSequence for working with sequence types, with a getElement(int i) predicate for accessing elements.

That shared library uses a parameterised module InputSig which individual (JS, Python or Ruby) then specialise according to their needs: for example, JS does this here: https://github.com/github/codeql/blob/main/javascript/ql/lib/semmle/javascript/YAML.qll#L11

Then finally JavaScript queries can import that YAML module and write queries, like this: https://github.com/github/codeql/blob/main/javascript/ql/lib/semmle/javascript/Actions.qll

Here the JavaScript library is using the YAML classes and predicates to break down a Github Actions definition.

I hope this helps get to grips with using CodeQL on YAML data to some degree-- please do let me know if you have further questions.

Exloit commented 2 months ago

Thank you, I have solved many problems based on your answers.

Exloit commented 2 months ago

I will close this issue, thank you