getsentry / raven-python

Raven is the legacy Python client for Sentry (getsentry.com) — replaced by sentry-python
https://sentry.io
BSD 3-Clause "New" or "Revised" License
1.68k stars 657 forks source link

password not filtered when it contains `=` #1263

Open bmoelans opened 6 years ago

bmoelans commented 6 years ago

We noticed getting unfiltered passwords into our Sentry.

The environment we run is:

With following json in the body of the call

{
    "password":"blabla",
    "code":"12345"
}

we got in Sentry Body

{
code: test1, 
password: [Filtered]
}

So far so good

But now come the strange things

Strange Example 1

{
    "password":"blablabla",
    "code":"=lookthistext&testb"
}

gives:

Body

{
code: =********&testb, 
password: [Filtered]
}

Password is filter, but already strange behaviour

Strange Example 2

{
    "password":"bla=blabla",
    "code":"test3"
}

gives:

Body

{ "password":"bla | [Filtered]

So the password before = is visible

Strange Example 3

{
    "password":"blablablabla-test3",
    "code":"test=123-456"
}

gives:

Body

{ "password":"blablablabla-test3", "code":"test | [Filtered]

So the complete password is visible

Is this missing settings or a bug in Raven?

bmoelans commented 6 years ago

@ashwoods and/or @mitsuhiko can you help me out?

mitsuhiko commented 6 years ago

My recommendation right now is to add a custom processor to fix such cases. The system is unfortunately not good enough to cover all cases and we're investigating some alternatives at the moment to deal with this.

bmoelans commented 6 years ago

@mitsuhiko I found a solution. The problem is at https://github.com/getsentry/raven-python/blob/master/raven/processors.py#L118 since at that point you code have data['data']={\n "password":"blablablabla",\n "code":"&r=0.747487-1105507184"\n}', so JSON string, but by that code it would be split as {\n "password":"blablablabla",\n "code":"&r and 0.747487-1105507184"\n}'.

A solution that does the trick for me now is before that step do

if n == 'data' and isinstance(data[n], str) and self._is_json(data[n]):
                data[n] = re.sub(rf'("({"|".join(self.sanitize_keys)})":)(".*")', rf'\1"{self.MASK}"', data[n])

with

    def _is_json(self, value: str) -> bool:
        try:
            loads(value)
        except ValueError:
            return False
        return True

although I am not sure that maybe adding not self._is_json(data[n]) to https://github.com/getsentry/raven-python/blob/master/raven/processors.py#L118 can be enough