SAML-Toolkits / python3-saml

MIT License
688 stars 306 forks source link

Remove lxml version restriction #323

Closed nosnilmot closed 1 year ago

nosnilmot commented 2 years ago

Add workaroud for the issue lxml version was originally pinned for.

Addresses #292 with a workaround (https://github.com/onelogin/python3-saml/issues/292#issuecomment-1002604291) that does not require pinning lxml version, so newer lxml can be used with CVE vulnerability fixes. Fixes #319

Untested as I do not have a testcase for the original failure, but I want to use python3-saml without having to downgrade lxml.

For other reports of the underlying issue see also: https://bugs.launchpad.net/lxml/+bug/1960668 https://mail.python.org/archives/list/lxml@python.org/thread/SCMXQYGN7CQMSPJI3PEW2YBT4YZKNML2/

EDIT: this PR now just removes the version restriction and documents how to avoid the libxml2 version incompatibilities following investigation into cause of the issue, no workarounds

nosnilmot commented 2 years ago

I just noticed https://github.com/onelogin/python3-saml/issues/292#issuecomment-1013012669 says this workaround does not work 100% so maybe this PR isn't acceptable

chipx86 commented 2 years ago

(Hi Stu!) It'd be great to see this in shape and merged, if the linked issue could be found and resolved. We noticed today that the supported range of lxml versions (and therefore python3-saml) fail to compile on the upcoming Python 3.11. The 4.9.x releases do work on 3.11.

nosnilmot commented 2 years ago

Hey Christian! thanks for the reminder that I needed to follow up on this. I don't think the PR here is a good solution, but I have made some further discoveries. I updated the upstream bug report, but I'll repeat here:

I've done some further testing and I believe I have narrowed down the root cause here to be a problem with xmlsec and lxml using different versions of libxml2 at runtime.

lxml wheels are built with statically linked libxml2 (2.9.12+ for lxml>4.7.0) while xmlsec will compile with (and dynamically link to) system libxml2 at installation time. If these are sufficiently different versions, I think some libxml2 internal mismatch causes the issue reported here.

Simply upgrading the system libxml2 (note: I do not recommend anyone actually does this to replace OS provided libxml2!) without recompiling or reinstalling xmlsec or lxml is one way to "resolve" it.

Ideally, lxml would be dynamically linked to libxml2, and share the same system-provided libxml2 library as xmlsec. I do not fully understand why this is not the case, or what the capabilities or limitations of python packaging are.

My recommended solution/workaround, for the time being, is for affected users to install lxml using 'pip install --no-binary lxml lxml' to force it to be built locally, which will also result in using shared libxml2 instead of static.

I've been meaning to look to see if there is a way to specify in setup.py/install_requires that lxml should not be installed from binary and update this PR with that alternative, but I'm not sure if that is possible.

maparent commented 2 years ago

install_requires cannot do this, but the following works in a requirements file:

--no-binary lxml
lxml>=4.9.1

I think it makes a good argument for moving to pip.

mrmoss commented 2 years ago

@pitbulk - If you give the greenlight we can merge and release.

nosnilmot commented 2 years ago

@pitbulk - If you give the greenlight we can merge and release.

It would be better to just remove the lxml version restriction and document that lxml should not be installed from binary, instead of merging this ugly workaround (which might not be 100% effective anyway).

koleror commented 2 years ago

The "cleaner" workaround would be to drop the binaries of the current lxml package, and provide a new lxml-binary with the current binaries, like the psycopg2 team made (psycopg2 vs psycopg2-binary). But that's out of the scope of this issue unfortunately

nosnilmot commented 1 year ago

@pitbulk - If you give the greenlight we can merge and release.

It would be better to just remove the lxml version restriction and document that lxml should not be installed from binary, instead of merging this ugly workaround (which might not be 100% effective anyway).

@mrmoss, I updated this PR to do just that, can you merge and make a new release of python3-saml to unblock users from upgrading lxml to avoid the security vulnerability CVE-2022-2309?

Let me know you would prefer to have the 2 commits squashed first.

wells commented 1 year ago

@mrmoss @bzvestey @eriktalvi. Pretty please merge and tag a release.

@pitbulk. Looks like they are waiting for you to review this and greenlight.

kkszysiu commented 1 year ago

Is it supposed to be merged anytime soon? This blocks migration to Python 3.11 for some of the projects I'm involved in and I am currently thinking if I should still use python3-saml or replace it with different lib such as pysaml2.

Thanks!

colinet commented 1 year ago

I can't upgrade to python 3.11 because of lxml dependancy. Moreover, the lxml version used by python3-saml has a security vulnerability.

Should I wait and follow up with python3-saml or switch to pysaml2.

Thanks!

thebotsmith commented 1 year ago

Really need this to be able to use python 3.11

pitbulk commented 1 year ago

Hi, I'm back and I'm trying to understand if there is currently a better solution than the one proposed here, as it seems it does not work for everyone.

@nosnilmot, @koleror, @maparent, @sreenivasulureddysura, @rlaunch, @aquatix, @memocong , @bgaifullin

Can you shared your though?

Saw in xmlsec the following:

lxml >= 3.8.0, !=4.7.0

so I don't think is fixed there neither..

I'm looking for a permanent solution on this one.

akx commented 1 year ago

I'm not sure what the problem with lxml >= 3.8.0, !=4.7.0 is; it just means exactly 4.7.0 (the yanked version, see https://github.com/SAML-Toolkits/python3-saml/pull/297) isn't acceptable, but e.g. 4.9.2 (the latest version at the time of writing) would work.

nosnilmot commented 1 year ago

Hi, I'm back and I'm trying to understand if there is currently a better solution than the one proposed here, as it seems it does not work for everyone.

What do you think does not work?

I agree the original workaround was crude, but this PR now just removes the version restriction on lxml and provides documentation on how to get a working setup for the identified problem of different libxml2 versions used by xmlsec and lxml (which cannot be solved by xmlsec when lxml is bundling libxml2 in binary wheels).

Maybe it isn't clear what this PR does because I haven't squashed the commits, I will do that now.

pitbulk commented 1 year ago

@nosnilmot, Correct me if I'm wrong, but it seems that some specific versions of lxml and libxml2 breaks the SAML parsing.

Removing the restrictions allows pip to install whatever version of lxml.

The PR mentions ways to prevent installing lxml from binaries, but what gonna happen to project that already have lxml installed? If for some reasons a project update its dependencies, is possible that python3-saml could be updated at the same time that lxml, and could be installed a non compatible version, providing a situation where a working, (but vulnerable project), stop working.

That why I'm trying to get more context here and try to force if possible compatible and secure versions of lxml.

nosnilmot commented 1 year ago

Correct me if I'm wrong, but it seems that some specific versions of lxml and libxml2 breaks the SAML parsing.

The problem comes from having different versions of libxml2 used by xmlsec and lxml, which can happen because xmlsec links against the system provided libxml2; and lxml by default bundles and links to its own private copy of libxml2 in binaries.

Removing the restrictions allows pip to install whatever version of lxml.

Correct.

The PR mentions ways to prevent installing lxml from binaries, but what gonna happen to project that already have lxml installed?

They should read the documentation that says to reinstall lxml without binaries.

If for some reasons a project update its dependencies, is possible that python3-saml could be updated at the same time that lxml, and could be installed a non compatible version, providing a situation where a working, (but vulnerable project), stop working.

It's possible, but it's solvable (reinstall lxml), and it is far better than forcing a known vulnerable library on your project's users for months.

That why I'm trying to get more context here and try to force if possible compatible and secure versions of lxml.

You can do that by switching to pip / requirements.txt for specifying dependencies and including

--no-binary lxml
lxml>=4.9.1

as suggested by @maparent, but I think it is important to unblock users from getting security updates first.

@koleror mentioned a better solution but that would need to be implemented by lxml upstream.

pitbulk commented 1 year ago

Based on CVE-2022-2309 which is the security bug we are missing due the lxml version block, this only applies to lxml <4.9.1 when is used together with libxml2 2.9.10 through 2.9.14.

If I'm not wrong, xmlsec uses 2.9.4 when building win binaries and in its readme says that it requires at least 2.9.1, not sure if trying to force the latest libxml2 library

lxml 4.6.5 should be installing libxml2 2.9.10 and here is where we have problems.

We need to go deeper and see what was introduced in lxml 4.7.1 that broke the python3-saml parser. I believe is related to the c14N change commented on the changelog

The PR mentions ways to prevent installing lxml from binaries, but what gonna happen to project that already have lxml installed?

They should read the documentation that says to reinstall lxml without binaries.

Ideally yes, but in the real world, there are tons of projects which basically update dependencies automatically and they don't know what packages they are using at all.

nosnilmot commented 1 year ago

Based on CVE-2022-2309 which is the security bug we are missing due the lxml version block, this only applies to lxml <4.9.1 when is used together with libxml2 2.9.10 through 2.9.14.

If I'm not wrong, xmlsec uses 2.9.4 when building win binaries and in its readme says that it requires at least 2.9.1, not sure if trying to force the latest libxml2 library

This appears to be an incomplete sentence. I have not anywhere been suggesting forcing the latest libxml2 library, and I would strongly recommend against trying to force that, given it is a core OS library.

Except maybe on windows, you cannot hope to determine specifically what version of libxml2 xmlsec will use at runtime.

We need to go deeper and see what was introduced in lxml 4.7.1 that broke the python3-saml parser. I believe is related to the c14N change commented on the changelog

There is no specific change in lxml 4.7.1 that breaks python3-saml.

The most relevant lxml bug report is https://bugs.launchpad.net/lxml/+bug/1960668.

The cause is when system libxml2 (which xmlsec links to) is sufficiently different and incompatible with the libxml2 bundled with lxml.

The PR mentions ways to prevent installing lxml from binaries, but what gonna happen to project that already have lxml installed?

They should read the documentation that says to reinstall lxml without binaries.

Ideally yes, but in the real world, there are tons of projects which basically update dependencies automatically and they don't know what packages they are using at all.

By this argument users might also be updating lxml after installing python3-saml anyway, either because they want a fix for CVE-2022-2309 or they need python 3.11 compatibility (or both).

Let's not make perfect the enemy of progress here.

pitbulk commented 1 year ago

@nosnilmot is there any way to identify those libxml2 internal mismatch at python3-saml so we could add a warning while installing/updating or running tests?

nosnilmot commented 1 year ago

@nosnilmot is there any way to identify those libxml2 internal mismatch at python3-saml so we could add a warning while installing/updating or running tests?

Not directly, because xmlsec does not expose an API to determine what version of libxml2 it is using (neither the native library or the python bindings).

I have added a test for the known failure case to the testsuite, some variation of it could be included at runtime too. It's derived from the testcase on the lxml bug report. It does not guarantee there are no other subtle compatibility issues.

#!/usr/bin/env python
import xmlsec
from lxml import etree

def libxml2check():
    env = etree.fromstring('<xml></xml>')
    sig = xmlsec.template.create(
        env,
        xmlsec.Transform.EXCL_C14N,
        xmlsec.Transform.RSA_SHA256,
        ns="ds"
    )

    ds = etree.QName(sig).namespace
    cm = sig.find(".//{%s}CanonicalizationMethod" % ds)

    if cm is not None:
        return True
    else:
        return False

if libxml2check():
    print("ok")
else:
    print("fail")
pitbulk commented 1 year ago

Ok, thanks for all the info provided and the effort. I will try to get time and unblock this issue soon.