Insecure data serialization method by default with pickle on Cache

matheusbrat commented 4 years ago

Hello,

Beaker uses Pickle on Session and Cache. On Session at least it has support to secret/HMAC but on caching it doesn't which can lead to arbitrary code execution.

https://github.com/bbangert/beaker/issues/35

Maybe it would be wise to expose data_serializer in Cache too and maybe even allow a secret definition for caching and perhaps default to json serialization.

What do you think?

amol- commented 4 years ago

Sessions should provide a data_serializer argument that allows you to switch between JSON and PICKLE.

For session the primary driving force was that under some conditions it was easy to edit the session content (even just by editing a cookie in your browser if you didn't sign the cookie) and thus it could be an easy attack vector.

For caches it has been a generally widespread need to cache Python objects (for example people frequently cache the entire result of a SQLAlchemy query) so pickle has been the facilitator that allowed that.

On the cache the risk seems much more more contained, as you usually control what you cache and an attacker should have already breached your internal network, your database or your codebase to be able to inject malicious pickle data.

matheusbrat commented 4 years ago

It is much more contained but the risk exists and I believe it could be reduced even more if we had the secret option as we do for Session. As you said a bad actor already gained access to something it shouldn't. In my opinion, not improving because the bad actor already gained some sort of access it the same as leaving root without protection - If the bad actor already got access to user...

This also reminds me: Shouldn't we replace the sha1 to sha256?

amol- commented 4 years ago

On the pickle in caches the only issue is that there is no easy way to provide the same level of features that people expect from the cache. By replacing pickle with something else it would still need to be able to decode to arbitrary Python objects and thus it would probably quickly grow the same security concerns that Pickle has.

In the context of Pickle I would probably go in the direction of signing every cached value as an optional feature, so that you know they were not tampered with when loading them back. Seems a good reduction to the risk of most cases without having to sacrifice any existing feature of the cache.

PS: Which SHA1 usages you refer to? SHA1 in beaker it's mostly just used a way to generate random ids (sometimes stable). I would probably replace it with uuid in most cases. But people being able to reverse those should only result in reading a bunch of pretty random strings.

matheusbrat commented 4 years ago

Sorry that I wasn't clear but I was referring to the "secret" we have on SignedCookie approach which is exactly you said of signing the messages.

I agree that would reduce the attack surface a lot.

How do you think it would be good to implement this? Maybe something like:

Extending NameSpaceManager to have a loads/dumps method which simply return the value
Creating a new PickledNameSpaceManager which implements the loads/dumps with pickle.loads/dumps and if secret is present also does the validation.
Update Redis/Database/Etc to extend from PickledNameSpaceManager and instead of relying on pickle loads/dumps directly they would call self.loads for the value they fetched on their own way.

For example for Redis:

    def __getitem__(self, key):
        entry = self.client.get(self._format_key(key))
        if entry is None:
            raise KeyError(key)
        return self.loads(entry)

I'm not super familiar with beaker code so feel free to suggest a better approach.

tcullum-rh commented 4 years ago

@x-warrior @amol- , I've not very familiar with beaker, so wanted to clarify: What is the risk & source of untrusted data here? From what I see (and I could be wrong), the container.py file has code which deserializes data from the dbm via pickle.loads(self.dbm[key]) and this is the issue. Am I right? I'm trying to understand how untrusted data could make its way into the cache in beaker use-cases.

matheusbrat commented 4 years ago

@tcullum-rh, If you assume your system is secure and only beaker has access to the storage... You can think it is "secure". But if a bad actor is able to get access to your database (where you store your cache), he can now easily run any arbitrary code in the machine running beaker as well.

A lot of people would say: "Ah but than your system is already compromised..." but I believe this is not an excuse, you are making the life of the bad actors to gain access to everything even easier.

If beaker signed the message before saving it on the storage, in the case a bad actor gets access to the storage the attack surface would be smaller.

Does that make sense?

tcullum-rh commented 4 years ago

@x-warrior yes, thank you. Note that this is now CVE-2013-7489: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-7489 .

matheusbrat commented 4 years ago

Thanks @tcullum-rh. I will wait for replies from beaker team and than maybe I can work to improve this.

amol- commented 4 years ago

If you are willing to submit a PR to sign the cached messages, I'm more than willing to review & merge it. Just make sure it's possible to enable/disable signature and it's backward compatible as I can foresee most people will get angry if their cache suddenly can't load or if loading cached values suddenly takes longer than computing them :D

EtienneBruines commented 3 years ago

Pyupio gave this id pyup.io-38464

https://github.com/pyupio/safety-db/blob/master/data/insecure_full.json#L975

neoclust commented 3 years ago

do you have any news on this CVE ?

tang475 commented 3 years ago

Do you have any news on this CVE ? We are using this software and we are looking for a way to fix this issue. Thank you

vivekhub commented 2 years ago

If beaker signed the message before saving it on the storage, in the case a bad actor gets access to the storage the attack surface would be smaller. @matheusbrat how would this solution work? If a bad actor gets access to the storage, they can change the contents and the signature (assuming the signature is stored with data). Please help me understand your solution here.

matheusbrat commented 2 years ago

If a bad actor gets access to the storage, they can change the contents and the signature (assuming the signature is stored with data). Please help me understand your solution here.

If a bad actor get access to the storage only, he would not have the "signing key" thus he would not be able to generate a valid signature for the new content. Meaning whenever the "backend" load that content, it would see the key doesn't match it's own key and discard the data or throw an exception.

Does this make sense?

bbangert / beaker

Insecure data serialization method by default with pickle on Cache #191