github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.52k stars 1.5k forks source link

LGTM.com - false positive - Sensitive data (id) is logged #6444

Open DimitriPapadopoulos opened 3 years ago

DimitriPapadopoulos commented 3 years ago

Description of the false positive

Variable names may contain UID but it refers to the unique identifiers of DICOM images and in the context of reading DICOM images this is not really sensitive and actually quite useful.

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/examples/input_output/plot_read_dicom.py#x9f6ff454e3dded9b:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/fileset.py#x92cc3e7130d36e7d:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/fileset.py#xe8e83c6abceded69:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/fileset.py#x322737eb169b0d2a:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/fileset.py#xed0b10d5ac1c529d:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/uid.py#xf399f778f423689c:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/uid.py#x9dbc6970b92dab79:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/examples/input_output/plot_read_fileset.py#x87ca98b04d02d94c:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/pydicom/_uid_dict.py#xe57e923c8e1b6ef2:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/source/generate_uids/generate_storage_sopclass_uids.py#xa8de055eb0c88014:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/source/generate_uids/generate_storage_sopclass_uids.py#x308ab3116bb90ea9:1

https://lgtm.com/projects/g/pydicom/pydicom/snapshot/b3ab7ceed7ea4b8b79e6f72248096c97dd527328/files/source/generate_uids/generate_storage_sopclass_uids.py#xbd9ab537eadfe9d9:1

DimitriPapadopoulos commented 3 years ago

Edit: updated thanks to comments from @The-Compiler

I guess the relevant regex is this one: https://github.com/github/codeql/blob/db4c8df/csharp/ql/src/Security%20Features/CWE-798/HardcodedConnectionString.ql#L25 https://github.com/github/codeql/blob/ec0066d/python/ql/lib/semmle/python/security/internal/SensitiveDataHeuristics.qll#L62

See also: https://github.com/github/codeql/blob/ec0066d5a4af53610b236600aa8364ad13923267/python/ql/src/Security/CWE-312/CleartextLogging.ql#L1-L14

I do see the risks here: CWE-321: Use of Hard-coded Cryptographic Key CWE-798: Use of Hard-coded Credentials

However, @precision high means a high "percentage of query results that are true positives (as opposed to false positive results)". This value is too optimistic, a UID is often a "unique ID", not necessarily a "user ID". So change to @precision medium?

The-Compiler commented 3 years ago

I don't think it's that regex, given that it's in the csharp/ subfolder and refers to a connection string, with a different check ID.

Based on the query ID py/clear-text-logging-sensitive-data displayed on LGTM, the check is defined in python/ql/src/Security/CWE-312/CleartextLogging.ql. That then seems to refer to CleartextLogging.qll and CleartextLoggingCustomizations.qll, but I don't actually see any of the search strings in there.

The-Compiler commented 3 years ago

Ah, and just as I submitted my comment, I found it: In SensitiveDataHeuristics.qll.

DimitriPapadopoulos commented 3 years ago

@The-Compiler Thanks for the comments, I have updated my comment accordingly.