github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.51k stars 1.49k forks source link

LGTM.com - URL sanitization inconsistency #2360

Open dgw opened 4 years ago

dgw commented 4 years ago

Description of the false positive

Either this is a false positive on function post_to_clbin, or the post_to_0x0 function below is being missed by the analysis despite containing nearly identical code.

I'm not really sure what we're doing in that code actually has anything to do with URL substring sanitization, but what actually bugs me is the reporting inconsistency.

URL to the alert on the project page on LGTM.com

https://lgtm.com/projects/g/sopel-irc/sopel/snapshot/182754526ebc14910cf4082e664542155d4460d9/files/sopel/modules/help.py?sort=name&dir=ASC&mode=heatmap#x284805de091dcebc:1

tausbn commented 4 years ago

So, the reason there is a difference between the two functions is that ://clbin.com/ is recognised as a probable URL due to the presence of the .com (and various other parts). We do not recognise .st in the same way, and hence we do not match ://0x0.st as a URL. We would recognise it if it were instead something like http://0x0.st.

You can see the code responsible for these heuristics here: https://lgtm.com/query/rule:1507386916281/lang:python/

I think this counts as a false positive. Even though there is matching on URL-like strings going on, these strings are not being used to e.g. redirect to the given URL, and so they are not really unsafe. We'll look into how to fix the query to eliminate these false positives.