Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
543 stars 372 forks source link

[BUG] GoalStateTestCase.test_it_should_refresh_the_goal_state_when_it_is_inconsistent failing with openssl 3.2.2 in Ubuntu 24.10 (python 3.12) #3163

Closed mirespace closed 3 months ago

mirespace commented 4 months ago

Hi team,

Describe the bug: A clear and concise description of what the bug is.

When building WALinuxAgent in Ubuntu 24.10 the following test is failing:

=================================== FAILURES ===================================
_ GoalStateTestCase.test_it_should_refresh_the_goal_state_when_it_is_inconsistent _

self = <tests.common.protocol.test_goal_state.GoalStateTestCase testMethod=test_it_should_refresh_the_goal_state_when_it_is_inconsistent>

    def test_it_should_refresh_the_goal_state_when_it_is_inconsistent(self):
        #
        # Some scenarios can produce inconsistent goal states. For example, during hibernation/resume, the Fabric goal state changes (the
        # tenant certificate is re-generated when the VM is restarted) *without* the incarnation changing. If a Fast Track goal state
        # comes after that, the extensions will need the new certificate. This test simulates that scenario by mocking the certificates
        # request and returning first a set of certificates (certs-2.xml) that do not match those needed by the extensions, and then a
        # set (certs.xml) that does match. The test then ensures that the goal state was refreshed and the correct certificates were
        # fetched.
        #
        data_files = [
            "wire/certs-2.xml",
            "wire/certs.xml"
        ]

        def http_get_handler(url, *_, **__):
            if HttpRequestPredicates.is_certificates_request(url):
                http_get_handler.certificate_requests += 1
                if http_get_handler.certificate_requests < len(data_files):
                    data = load_data(data_files[http_get_handler.certificate_requests - 1])
                    return MockHttpResponse(status=200, body=data.encode('utf-8'))
            return None
        http_get_handler.certificate_requests = 0

        with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol:
            protocol.set_http_handlers(http_get_handler=http_get_handler)
            protocol.mock_wire_data.reset_call_counts()

            goal_state = GoalState(protocol.client)

>           self.assertEqual(2, protocol.mock_wire_data.call_counts['goalstate'], "There should have been exactly 2 requests for the goal state (original + refresh)")
E           AssertionError: 2 != 1 : There should have been exactly 2 requests for the goal state (original + refresh)

tests/common/protocol/test_goal_state.py:476: AssertionError

The curious thing is that this was not happening a pair of weeks ago, so I've checked what changed at distro level to see if another package could be impacting WALinuxagent. The suspicious was openssl, and I can confirm it is the cause, but I couldn't discover 100% the underlying root of the problem, because using previous version 3.2.1, the test passes:

dpkg: warning: downgrading libssl3t64:amd64 from 3.2.2-1ubuntu1 to 3.2.1-3ubuntu1
[...]
dpkg: warning: downgrading openssl from 3.2.2-1ubuntu1 to 3.2.1-3ubuntu1
[...]

(oracular-amd64)root@Avalon:/build/walinuxagent-s1lr7I/walinuxagent-2.11.1.4# pytest tests/common/protocol/test_goal_state.py -k test_it_should_refresh_the_goal_state_when_it_is_inconsistent
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /build/walinuxagent-s1lr7I/walinuxagent-2.11.1.4
collected 21 items / 20 deselected / 1 selected                                                                                                                                            

tests/common/protocol/test_goal_state.py 

===================================================================================== warnings summary =====================================================================================
[…]
====================================================================== 1 passed, 20 deselected, 183 warnings in 1.88s ====================================================================

Changes between opennsl versions (3.2.1 and 3.2.2): https://github.com/openssl/openssl/blob/openssl-3.2.2/CHANGES.md#changes-between-321-and-322-4-jun-2024

Distro and WALinuxAgent details :

Additional context

I explored different options to get to the point on what is happening, with no luck... so sharing here all the way I did (thanks in advance!):

Checking when openssl is involved in the test, I saw the check_certificates in azurelinuxagent/common/protocol/goal_state.py that calls Certificates(), and it ends calling decrypt_p7m func from CryptUtil in azurelinuxagent/common/utils/cryptutil.py which executes /usr/bin/openssl cms -decrypt -in Certificates.p7m -inkey TransportPrivate.pem -recip TransportCert.pem | /usr/bin/openssl pkcs12 -nodes -password pass: -out Certificates.pem.

This operation fails once, happening with both openssl 3.2.1 -the "good"- and 3.2.2 - "bad":

azurelinuxagent.common.utils.shellutil.CommandError: '/usr/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase__8ay2k22/Certificates.p7m -inkey /tmp/GoalStateTestCase__8ay2k22/TransportPrivate.pem -recip /tmp/GoalStateTestCase__8ay2k22/TransportCert.pem | /usr/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase__8ay2k22/Certificates.pem' failed: 1 (Error decrypting CMS using private key)

But it seems it "recovers" well in one case and it doesn't in the other. I checked how many times are openssl called in bad a good case:

Good

pytest            0 /usr/bin/pytest tests/common/protocol/test_goal_state.py -k test_it_should_refresh_the_goal_state_when_it_is_inconsistent
lsb_release       0 /usr/bin/lsb_release -a
getopt            0 /usr/bin/getopt --name lsb_release -o hvidrcas -l help,version,id,description,release,codename,all,short -- -a
cut               0 /usr/bin/cut -c1
tr                0 /usr/bin/tr [:lower:] [:upper:]
cut               0 /usr/bin/cut -c2-
tr                0 /usr/bin/tr [:upper:] [:lower:]
tr                0 /usr/bin/tr [:upper:] [:lower:]
uname             0 /usr/bin/uname -rs
openssl           0 /usr/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase_dir/Certificates.p7m -inkey /tmp/GoalStateTestCase_dir/TransportPrivate.pem -recip /tmp/GoalStateTestCase_dir/TransportCert.pem
openssl           0 /usr/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase_dir/Certificates.pem
openssl           0 /usr/bin/openssl rsa -in /tmp/GoalStateTestCase_dir/0.prv -pubout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -pubkey -noout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -fingerprint -noout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -pubkey -noout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -fingerprint -noout
openssl           0 /usr/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase_dir/Certificates.p7m -inkey /tmp/GoalStateTestCase_dir/TransportPrivate.pem -recip /tmp/GoalStateTestCase_dir/TransportCert.pem
openssl           0 /usr/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase_dir/Certificates.pem
openssl           0 /usr/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase_dir/Certificates.p7m -inkey /tmp/GoalStateTestCase_dir/TransportPrivate.pem -recip /tmp/GoalStateTestCase_dir/TransportCert.pem
openssl           0 /usr/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase_dir/Certificates.pem
openssl           0 /usr/bin/openssl rsa -in /tmp/GoalStateTestCase_dir/0.prv -pubout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -pubkey -noout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -fingerprint -noout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -pubkey -noout
openssl           0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -fingerprint -noout

And in the bad is one missing:

Bad

pytest             0 /usr/bin/pytest tests/common/protocol/test_goal_state.py -k test_it_should_refresh_the_goal_state_when_it_is_inconsistent
lsb_release        0 /usr/bin/lsb_release -a
getopt             0 /usr/bin/getopt --name lsb_release -o hvidrcas -l help,version,id,description,release,codename,all,short -- -a
cut                0 /usr/bin/cut -c1
tr                 0 /usr/bin/tr [:lower:] [:upper:]
cut                0 /usr/bin/cut -c2-
tr                 0 /usr/bin/tr [:upper:] [:lower:]
tr                 0 /usr/bin/tr [:upper:] [:lower:]
uname              0 /usr/bin/uname -rs
openssl            0 /usr/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase_dir/Certificates.p7m -inkey /tmp/GoalStateTestCase_dir/TransportPrivate.pem -recip /tmp/GoalStateTestCase_dir/TransportCert.pem
openssl            0 /usr/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase_dir/Certificates.pem
openssl            0 /usr/bin/openssl rsa -in /tmp/GoalStateTestCase_dir/0.prv -pubout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -pubkey -noout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -fingerprint -noout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -pubkey -noout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -fingerprint -noout
openssl            0 /usr/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase_dir/Certificates.p7m -inkey /tmp/GoalStateTestCase_dir/TransportPrivate.pem -recip /tmp/GoalStateTestCase_dir/TransportCert.pem
openssl            0 /usr/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase_dir/Certificates.pem

openssl            0 /usr/bin/openssl rsa -in /tmp/GoalStateTestCase_dir/0.prv -pubout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -pubkey -noout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/1.crt -fingerprint -noout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -pubkey -noout
openssl            0 /usr/bin/openssl x509 -in /tmp/GoalStateTestCase_dir/2.crt -fingerprint -noout

So, I checked if the Certificates.pem generated are different depending on the SSL version, and is not:

(oracular-amd64)root@Avalon:/tmp/3.2.2# md5sum Certificates.pem 
593387341dce417a3025192902f882eb  Certificates.pem

(oracular-amd64)root@Avalon:/tmp/3.2.1# md5sum Certificates.pem 
593387341dce417a3025192902f882eb  Certificates.pem

Accidentally, when doing a trace with pdb, I jumped out the reading loop when reading that file (to line 621 in azurelinuxagent/common/protocol/goal_state.py) and the test passed.. .I'm sharing that pdb session with you here in case I could give you a clue that I can't be able to identify. I put the breakpoints at the end.

Anything else you need from me, please let me know... Thanks for looking into it!

Miriam

nagworld9 commented 3 months ago

@mirespace Thanks for opening the issue. I was able to repro after installing openssl version 3.2.2 as you highlighted. The other thing I noticed is this test uses 2 test certs. One of the cert(certs.xml) decrypted properly (no issues) but other certs-2.xml is failing while decrypting it. As a result, test validation failed.

'/usr/local/ssl/bin/openssl cms -decrypt -in /tmp/GoalStateTestCase_27xaqevo/Certificates.p7m -inkey /tmp/GoalStateTestCase_27xaqevo/TransportPrivate.pem -recip /tmp/GoalStateTestCase_27xaqevo/TransportCert.pem | /usr/local/ssl/bin/openssl pkcs12 -nodes -password pass: -out /tmp/GoalStateTestCase_27xaqevo/Certificates.pem' failed: 1 (Error decrypting CMS using private key)

I tested this in vm as well with sample certs and able to decrypt the cert.

root@ubu24:/var/lib/waagent# rm Certificates.pem
root@ubu24:/var/lib/waagent# /usr/local/ssl/bin/openssl cms -decrypt -in Certificates.p7m -inkey TransportPrivate.pem -recip TransportCert.pem | /usr/local/ssl/bin/openssl pkcs12 -nodes -password pass: -out Certificates.pem
root@ubu24:/var/lib/waagent# ls Certificates.pem
Certificates.pem

So far this is my observation, and I'll continue investigate further

narrieta commented 3 months ago

@mirespace thanks for reporting this. @nagworld9 and me looked into this and we need a small change in the Agent to accommodate a difference of behavior in the new openssl. We'll do this within the next couple of weeks.

mirespace commented 3 months ago

Thanks @narrieta and @nagworld9 !

narrieta commented 3 months ago

Fixed by #3166