IdentityPython / pysaml2

Python implementation of SAML2
Apache License 2.0
563 stars 423 forks source link

Reuse of AES initialization vector in AESCipher / UsernamePasswordMako / Server #417

Closed obi1kenobi closed 6 years ago

obi1kenobi commented 7 years ago

The Server class randomly generates a fixed 16-byte initialization vector (IV) for the purpose of encrypting data. Then, via the UsernamePasswordMako class, that fixed IV makes its way to the AESCipher class, where it is consistently reused for encrypting data.

Initialization vector reuse like this is a security concern, since it leaks information about the encrypted data to attackers, regardless of the encryption mode used. For example, if the IV is reused with the same key in AES-CTR mode, the attacker will very likely be able to entirely decrypt the encrypted data: https://crypto.stackexchange.com/questions/2991/why-must-iv-key-pairs-not-be-reused-in-ctr-mode

Instead of relying on a fixed, randomly-generated IV, it would be better to randomly-generate a new IV for every encryption operation. Here are a couple of links that have more information on why that is the preferred approach:

spaceone commented 7 years ago

CC

obi1kenobi commented 7 years ago

This issue has been assigned CVE-2017-1000246.

jkakavas commented 7 years ago

Hi there, thanks for the interesting report and congrats on getting your CVE.

Although I agree with you in principle and we are going to fix this shortly, I'm trying to look into the applicability of an attack based on your report and I'd appreciate your feedback.

. For example, if the IV is reused with the same key in AES-CTR mode,

We do not use or support CTR mode https://github.com/rohe/pysaml2/blob/4375361939e942c4dd666d3ca4e1159858404bc4/src/saml2/aes.py#L11-L15

Even if the attacked could decrypt the data, the only thing we encrypt is the username of the authenticated user before we store that to a cookie. This username is not secret ( in fact it would probably be included in a possibly non encrypted SAML Assertion a few steps down the line ) and I would be more concerned with the attacks against the integrity(authenticity) of the value rather than its confidentiality. **

For CBC mode, since the username vocabulary is limited, I would argue that it would be difficult to mount a chosen plaintext attack ( plaintext = username) and thus, even though the IV reuse is bad practice, I don't see how anyone using this code would be vulnerable to any attacks.

What do you think @obi1kenobi ?

** We don't do a pretty good job with that too admittedly, but I will be opening a separate issue for that.

obi1kenobi commented 7 years ago

Hi @jkakavas, thanks for the reply.

The issue exists inside the class that handles AES encryption generically for the entire library, and not just for encrypting usernames. I'd argue that even if usernames are indeed the only thing being encrypted right now, it isn't safe to assume that will continue to be the case in the future.

I agree that attacks on the message authenticity are concerning, as you point out.

I'd suggest three things:

The IV reuse problem must be fixed before switching to AES-GCM though, since AES-GCM has a similar construction to AES-CTR and fails in the same way when the IV is reused. Also, you might consider AES-GCM-SIV, which is resistant to reused IVs but is slightly more expensive in terms of computation time.

jkakavas commented 7 years ago

The issue exists inside the class that handles AES encryption generically for the entire library, and not just for encrypting usernames. I'd argue that even if usernames are indeed the only thing being encrypted right now, it isn't safe to assume that will continue to be the case in the future.

Yes, we only use it to encyrpt the username in the cookie, so it is not currently something that can e exploited. I agree though that it should be changed so we will not unknowingly inject any vulnerabilities in future versions.

I will make a pull request to address this. Thanks again for reporting it.

obi1kenobi commented 7 years ago

Sounds great -- thank you for looking into fixing this.

kylegibson commented 6 years ago

@jkakavas Is there a PR addressing this?

jkakavas commented 6 years ago

@kylegibson No, not yet. I can work on it soon-ish but if you have something you can contribute, please do by all means

dcripplinger commented 6 years ago

Was this issue ever resolved? My team is working on resolving all the CVEs, and we're not sure if simply upgrading the version does it, or what commit fixes it if any, or if we have a solid reason for ignoring it.

obi1kenobi commented 6 years ago

@dcripplinger as far as I can tell, the issue appears to not be resolved, at least not on the master branch. I believe it is still a problem that should be addressed, and it should be addressable by simply generating a new IV for every encryption operation.

laserlemon commented 6 years ago

@jkakavas :wave: Hello! I'm on the GitHub team responsible for sending security alerts for vulnerable versions of Python libraries. It appears that the issue described here (assigned CVE-2017-1000246) affects all versions as of its writing (<= 4.4.0). Since its writing, version 4.5.0 was released. Can you share whether version 4.5.0 fixes this issue?

As it stands now, we plan to alert pysaml2 users today on vulnerable versions <= 4.5.0 (all currently released versions), providing no remediation steps as no fixed version has yet been identified. Can you please confirm that this information is accurate? Thank you! ๐Ÿ’›

c00kiemon5ter commented 6 years ago

I am not a cryptographer so I cannot easily decide whether using AES-GCM or AES-GCM-SIV (as proposed by @obi1kenobi ) is the right thing.

Instead of relying on a fixed, randomly-generated IV, it would be better to randomly-generate a new IV for every encryption operation

However, to generate a new random IV for every encryption operation, one only needs not provide an IV in the first place. I will work on @obi1kenobi proposals immediately.


we plan to alert pysaml2 users today

@laserlemon sending security alerts is really helpful and mindful to do. It is great to have a team like that and thanks for doing that ๐Ÿ‘ A security alert though, usually panics people, and as such it should be of actual value; it should be an actual issue. Even though there is a CVE assigned, there is nothing of real interest here as @jkakavas already explained earlier. I wouldn't like to wake up with a security alert that ends up not being an actual issue.

Ofcourse, this is no excuse by the pysaml2-team not having acted upon this for so long ๐Ÿ˜Ÿ. I am here to fix this.


[offtopic] ๐Ÿฐ

I would be really interested to that list. I would like to be able to contact users of pysaml2 to understand their use cases and alert for breaking changes. Is this something the pysaml2-team can have access to? Can we generate that list somehow?


Again, thanks for doing this :octocat: I will prepare a PR soon.

c00kiemon5ter commented 6 years ago

Pull Request #519 created to address this.

jkakavas commented 6 years ago

I agree with @c00kiemon5ter's comments. Although this is a valid finding, I can't see how this can be exploited in any way (my thoughts/analysis on above) so I'm not sure the notification is helpful in that context. At least maybe since there is no 4.5 released, the notification could at least point to this issue in order for anyone interested to see why this is considered not exploitable.

jkakavas commented 6 years ago

@c00kiemon5ter maybe information on this can go out to the pysaml mailing lists so that folks are not taken by surprise by a potential GH notification ?

I would be really interested to that list. I would like to be able to contact users of pysaml2 to understand their use cases and alert for breaking changes. Is this something the pysaml2-team can have access to? Can we generate that list somehow?

-1 for me. I would rather invite people to voluntarily join the mailing list by pointing it out in the Readme, than obtaining such a list via Github ( which I don't think they're in a place to provide eitherway ) and sending out unsolicited emails. ( We probably shouldn't also, GDPR being one of the reasons at least ) WDYT ?

c00kiemon5ter commented 6 years ago

I am not planning to start a campaign ๐Ÿ˜œI am mostly interested in which parts of the library are being used, and which can be replaced or dropped without making the users inconvenient. It is mostly about us peeking into their projects and learning about their use cases, rather than contacting them. If we know there are users of a part of the library that is being changed (a breaking change) we can work on a migration path or propose a complementary tool.

laserlemon commented 6 years ago

Thank you everybody for your comments! After hearing your feedback, we decided not to send notifications to repos that currently use a version of pysaml2 that's "vulnerable" to this specific issue. Please note though, should someone push a new requirements.txt that targets a vulnerable pysaml2 version, they will receive a notification at that time.

Also, in regards to your question:

I would like to be able to contact users of pysaml2 to understand their use cases and alert for breaking changes. Is this something the pysaml2-team can have access to? Can we generate that list somehow?

Yes, you can see the public repos that depend on pysaml here on your Dependency Graph page. We are not able to provide additional contact information for users of pysaml2.

Thanks for providing such helpful feedback and for maintaining pysaml2! :clap:

c00kiemon5ter commented 6 years ago

The library we use to implement AESCipher is cryptography. I've looked into AES, the different modes and how they are supported. The recommended practice by cryptography is to use Fernet (spec).

Fernet guarantees that a message encrypted using it cannot be manipulated or read without the key. Fernet is an implementation of symmetric (also known as โ€œsecret keyโ€) authenticated cryptography.

The interface is also very simple ๐Ÿ‘Fernet provides generate_key, encrypt and decrypt. Behind the scenes:

Implementation

Fernet is built on top of a number of standard cryptographic primitives. Specifically it uses:

AES in CBC mode with a 128-bit key for encryption; using PKCS7 padding. HMAC using SHA256 for authentication. Initialization vectors are generated using os.urandom().

I think Fernet fills our use case and it is recommended practice. It does not use GCM or GCM-SIV (both of which are supported by cryptography, but introduce the authentication tag that needs to be kept available.)

I think I will go ahead and wrap Fernet to replace AESCipher.


Thank you everybody for your comments! After hearing your feedback, we decided not to send notifications to repos that currently use a version of pysaml2 that's "vulnerable" to this specific issue. Please note though, should someone push a new requirements.txt that targets a vulnerable pysaml2 version, they will receive a notification at that time.

I think this is reasonable ๐Ÿ‘ Thanks!

Yes, you can see the public repos that depend on pysaml here on your Dependency Graph page. We are not able to provide additional contact information for users of pysaml2.

That's great, thanks for pointing us there - I did not know about the dependency graph.

Cheers :octocat:

boydgreenfield commented 6 years ago

@laserlemon Just as a quick FYI, we did get a notification about this vulnerability on one of our repos (perhaps because it's private so the dependency graph features had to be enabled?). Not a big deal as we were already tracking this, but wanted to let you know in case that's not what was anticipated per your note above.

w0rp commented 6 years ago

I think the decision to show the warning on GitHub is wrong. This issue doesn't actually cause any issues anyone would care about. The big warning is just going to scare people for no good reason. It's not like using pysaml will mean someone will be able to hack your site.