datacenter / ACI-Pre-Upgrade-Validation-Script

A script to run validations to detect potential issues that may cause an ACI fabric upgrade to fail
https://datacenter.github.io/ACI-Pre-Upgrade-Validation-Script/
Apache License 2.0
42 stars 27 forks source link

"openssl cmd issue, send logs to TAC ERROR !!" always appears in APIC version 6.0(4c) #120

Closed myoshiito closed 1 month ago

myoshiito commented 5 months ago

(use upvote :thumbsup: for attentions) Describe the bug When running the ACI-Pre-Upgrade-Validation-Script on APIC version 6.0(4c), the following openssl cmd issue always occurs:

[Check 47/50] APIC CA Cert Validation... openssl cmd issue, send logs to TAC ERROR !!

However, when checking the certificate with the following commands as root on the APIC, they match:

openssl x509 -noout -modulus -in /securedata/apicca/apicca.crt | openssl md5 openssl rsa -noout -modulus -in /securedata/apicca/apicca.key | openssl md5

Script output A snippet of the script output that shows the behavior that appears to be a bug

To Reproduce Steps to reproduce the behavior such as:

  1. Running the ACI-Pre-Upgrade-Validation-Script on APIC version 6.0(4c).
  2. Confirmed the display of "openssl cmd issue, send logs to TAC ERROR !!"
  3. Become a root user on the APIC and use the following commands to verify if there are no issues with the certificate.

openssl x509 -noout -modulus -in /securedata/apicca/apicca.crt | openssl md5 openssl rsa -noout -modulus -in /securedata/apicca/apicca.key | openssl md5

Expected behavior In APIC version 6.0(4c) without any certificate issues, no errors should be displayed after APIC CA Cert Validation.

Additional context Add any other context about the problem here.

monrog2 commented 4 months ago

@myoshiito do you have the upgrade script result bundle still available, and if so, are you able to share it?

Result Bundle: /data/techsupport/preupgrade_validator_2024-05-....tgz

I ran this check on a few different fabrics/versions and did not see this error, so I'll need to see the debug log to see what exactly is failing in your APICs request and how we can better handle it in the script.

myoshiito commented 4 months ago

@monrog2 I have the logs from when this issue occurred in the customer's environment, and I will share them with you.

myoshito@aci-logviewer:~/dnld_xxx$ pwd
/users/myoshito/dnld_xxx
myoshito@aci-logviewer:~/dnld_xxx$ ls
20240430-231156081_xxxtxt  node-2  preupgrade_validator_2024-04-23T18-48-05+0900.json  preupgrade_validator_debug.log  tac-outputs.tgz
node-1                            node-3  preupgrade_validator_2024-04-23T18-48-05+0900.txt   tac-outputs
monrog2 commented 4 months ago

@myoshiito Thanks for sharing the location.

I checked them, but there is not enough logging to give me the exact failure output of the CMD, just enough to confirm that the subprocess call failed during the openssl cmd and it hit this piece of logic, and never got a chance to run the certreq:

            logging.debug('cmd = '+''.join(cmd))
            genrsa_proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
            genrsa_proc.communicate()[0].strip()
            if genrsa_proc.returncode != 0:
                print_result(title, ERROR, 'openssl cmd issue, send logs to TAC')
                return ERROR

If you still have access to the customer env, you can run the below steps to check, then replay, the cmd being used in this check to see if we can get more details on the failure.

Check and cleanup env

  1. Which user and domain was used to run the script when it is failing?

    a-apic1:techsupport> whoami
    admin
  2. Check which directory the script it being run in, /data/techsupport is recommended in the script docs:

    
    apic1:techsupport> ls -l | grep .py
    -rwxr-xr-x  1 admin       admin       157304 May 17 09:07 aci-preupgrade-validation-script.py

apic1:techsupport> pwd /data/techsupport


3. Check  if there is already any `temp*` or `gen.cnf` files, and which user wrote it:
```bash
apic1:techsupport> ls -l | egrep "temp|gen"
-rw-------  1 remUser       admin          339 May 17 09:16 gen.cnf
-rw-------  1 remUser       admin          899 May 17 09:16 temp.csr.pem
-rw-------  1 remUser       admin         1679 May 17 09:16 temp.key.pem
-rw-------  1 remUser       admin           92 May 17 09:16 temp.sign

Above we can see that remUser ran this script in the past, and this will cause file access issues if a different user is being used for the run.

If you see the same, login as the matching user and erase those files, or login as root to do the same, as they get created during the script run.

CMD replay

  1. get the curr passphrase:

    apic1:techsupport> moquery  -d 'uni/fabselfca' -o json | grep currCertReqPassphrase
          "currCertReqPassphrase": "123456789___ABC",
  2. recreate gen.cnf:

    apic1:techsupport> vi gen.conf

    press i for insert mode, paste in the matching value from the script:

            [ req ]
            default_bits        = 2048
            distinguished_name  = req_distinguished_name
            string_mask         = utf8only
            default_md          = sha512
            prompt              = no
    
            [ req_distinguished_name ]
            commonName                      = aci_pre_upgrade

    press esc then type :wq to write and quit out of vi.

  3. rerun the openssl command using the currCertReqPassphrase from step 4 after the -hmac:

    
    apic1:techsupport> /bin/openssl genrsa -out temp.key.pem 2048 && /bin/openssl req -config gen.cnf -new -key temp.key.pem -out temp.csr.pem && /bin/openssl dgst -sha256 -hmac 123456789___ABC -out temp.sign temp.csr.pem

Generating RSA private key, 2048 bit long modulus (2 primes) ...............................+++++ ......................+++++ e is xxx (0x------)


above is output when it worked. 

I would expect we see some error in your case if it was reproducible after cleanup of any file issues.
myoshiito commented 4 months ago

@monrog2 Sorry for late reply.

I have reproduced this issue in our lab {just using APIC version 6.0(4c)} and tried all the steps you mentioned. I have collected the following log at that time.

issue120.log

Upon checking the creator of gen.cnf file in step 3, it was found to be the same as the script executor, admin. Additionally, the following error occurred at step 6:

/bin/openssl: symbol lookup error: /bin/openssl: undefined symbol: Camellia_set_key, version OPENSSL_1_1_0

I can share our lab's access information if you would like to check it.

monrog2 commented 2 months ago

@myoshiito apologies as well on the delayed response.

Checking the steps I gave you, I realized there is a typo:

apic1:techsupport> vi gen.conf

should be:

apic1:techsupport> vi gen.cnf

Also, testing again locally i'm seeing that subsequent script runs fail when running pre-existing gen.cnf, so i'm wondering if we just have the script look for and erase any existing ones in the same dir its being run as (and probably rename it to something more unique, like upgrade_check.cnf)

Can you re-run the manual commands with above? Or ping me your lab credentials when you get a chance and I can re-run it and dig into it that way.

monrog2 commented 2 months ago

Did further testing on the lab fabric this was reported on and finding that the underlying issue may reside in the gen.conf file being created should be removed post run so that it does not cause issues with subsequent runs.

Fix would be the same as recently reported #142

myoshiito commented 2 months ago

Thank you, monrog2.

I really appreciate for your support.