False positive on integrity hashes

Yelp / detect-secrets

An enterprise friendly way of detecting and preventing secrets in code.

Apache License 2.0

3.74k stars 464 forks source link

False positive on integrity hashes #312

Closed timwsuqld closed 4 months ago

timwsuqld commented 4 years ago

link/script tags with an integrity value are detected as secrets. We should be able to easily filter these out as they have a fixed format, including the hash type.

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/datatables/1.10.19/css/dataTables.bootstrap.min.css" integrity="sha256-PbaYLBab86/uCEz3diunGMEYvjah3uDFIiID+jAtIfw=" crossorigin="anonymous" />

    "templates/Applications/edit.html.twig": [
      {
        "hashed_secret": "92f4c8c54305a58d053de1291df13dbc4a79275f",
        "is_verified": false,
        "line_number": 6,
        "type": "Base64 High Entropy String"
      },

I've used --exclude-lines 'integrity="sha256' for now to filter them out, but it may be nice to have that as a default somewhere? (Whitelist in the Base64 High Entropy String plugin?)

OiCMudkips commented 4 years ago

I would say this is applicable to only HTML files and maybe JS and CSS files. We don't really do anything special handling for these file types at the moment. I would be hesitant in whitelisting this for everyone for all file types.

As a side note, #310 might enable a whitelist like this.

timwsuqld commented 4 years ago

Would it be possible to write a whitelist for the hashtype-hash pattern? sha256-hash I'm not sure how hard it would be to write code to detect these strings, and then verify if the hash is the right size to match the hash type, so we can ignore them? (Assuming that we agree a hash isn't a secret?)

OiCMudkips commented 4 years ago

I agree hashes aren't secrets. The challenge with whitelisting hashes is that they look a lot like base64 or hex secrets. I agree that the sha256-hash string is a good indicator that the string isn't a secret, but I'm not confident enough to say that this is the case for every single codebase.

One way we could implement this is to have a default --word-list packaged with detect-secrets and include sha256-hash in that word list. We don't have a default --word-list though.

Another way we could do this whitelisting is to implement an HTML file parser and make our parser discard the integreity properties. This seems a bit heavy-handed, but could HTML secret scanning accuracy? We don't use many raw HTML files at Yelp so it's hard to judge detect-secrets's existing HTML accuracy.

Is there a particular reason that passing that line in every time is annoying?

lorenzodb1 commented 4 months ago

We're going to close this issue as it hasn't received any update in a very long time. Feel free to re-open it if you think it's still relevant.