aws / aws-iot-device-sdk-cpp

SDK for connecting to AWS IoT from a device using C++
http://aws-iot-device-sdk-cpp-docs.s3-website-us-east-1.amazonaws.com
Apache License 2.0
123 stars 112 forks source link

"Server Certificate Verification failed" with Custom Domain #213

Open yixiangding opened 9 months ago

yixiangding commented 9 months ago

Describe the bug

Hi,

I wanted to report this issue we have been experiencing while trying to switch the IoT Core to custom domain.

Understood that V2 is the solution and should have resolved most issues, we do plan to gradually migrate to V2.

However the thing is we still do have a large number of legacy hardware (Ubuntu 18.04) deployed in the field that we need to support, which makes it a major risk & high cost for us to bring those to V2.

Therefore, we would really appreciate if anyone could provide some insights on the issues with the IoT Core custom domain work with V1, so we can save our legacy hardware.

Issue

We are experiencing Server Certificate Verification failed. during the SSL handshake when connecting to our custom domain, which is also reproducible with the PubSub example.

However, the SDK works perfectly when simply connects to the ATS endpoint.

The only thing we swapped is the endpoint URL in the config file of the SDK, from ATS endpoint to our custom domain. Everything else is properly configured including the ACM SSL certs, VPC endpoint to the iot:data plane, etc.

What we have tried

  1. Tried both IoT CPP SDK V2 and IoT Python SDK to connect to the same custom domain: PubSub example works without problem for both SDKs, and MQTT test client can pub&sub normally
  2. Connect to ATS endpoint using CPP SDK V1: works
  3. Reinstall both OpenSSL 1.1.1 and 1.1.0g and recompile CPP SDK V1: issue persists
  4. Test with openssl s_client -connect <ipv4_addr>:443 -CAfile certs/rootCA.crt as well as openssl s_client -connect <custom_domain>:443 -CAfile certs/rootCA.crt: Both showed verification OK.

However we did notice without specifying the SNI, the s_client always uses the amazonaws SSL cert over our custom domain cert. The results are the following:

subject=CN = *.iot.us-west-2.amazonaws.com

issuer=C = US, O = Amazon, CN = Amazon RSA 2048 M01

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 5502 bytes and written 442 bytes
Verification: OK
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: 5C520F92903515F3208ED38EF9710B67D08C79298A107727B0859CEF2620F9C0
    Session-ID-ctx: 
    Master-Key: 83E247145B24F9654DDE25C4F5088AEBAF771C8FF8083D0FFBB7D7883ED9331480BE2729531008AEC6800E66DAD6DD0B
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1706304653
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: yes
  1. Tried both the oldest Domain security policy (IoTSecurityPolicy_TLS12_1_0_2015_01) on AWS console as well as the latest for the custom domain: issue persists
  2. Following the IoT server auth doc, added other cross-signed CA (G2-RootCA1.pem & SFSRootCAG2.pem) to the trusted store of the Linux environment of the hardware: issue persists

Detailed logs are (The Error resolving hostname: -5 and SSL Error Code: 2 are fine because they also exist when we connect to the ATS endpoint which works. However the SSL Error Code: 1 seems to be the cause, which maps to SSL_ERROR_SSL in our environment):

./pub-sub-sample 
[DEBUG] Thu Jan 25 18:57:38 2024
:450 [Thread Task] [548328968208] Creating Thread Outbound Action Processing!!
[DEBUG] Thu Jan 25 18:57:38 2024
:451 [Thread Task] [548328968208] Creating Thread TLS Read Action Runner!!
[DEBUG] Thu Jan 25 18:57:38 2024
:452 [Thread Task] [548328968208] Creating Thread MQTT Keep alive Action!!
[TRACE] Thu Jan 25 18:57:38 2024
:452 [Network Read] [548313076176] [PerformAction:L128] :  Network Read Thread, TLS Status : 0
[DEBUG] Thu Jan 25 18:57:38 2024
:452 [OpenSSL Wrapper] [548328968208] [LoadCerts:L369] : Root CA : /home/pi/aws-iot-device-sdk-cpp/build/bin/certs/rootCA.pem
[DEBUG] Thu Jan 25 18:57:38 2024
:453 [OpenSSL Wrapper] [548328968208] [LoadCerts:L377] : Device crt : /home/pi/aws-iot-device-sdk-cpp/build/bin/certs/cert.pem
[DEBUG] Thu Jan 25 18:57:38 2024
:453 [OpenSSL Wrapper] [548328968208] [LoadCerts:L382] : Device privkey : /home/pi/aws-iot-device-sdk-cpp/build/bin/certs/privkey.pem
[ERROR] Thu Jan 25 18:57:38 2024
:482 [OpenSSL Wrapper] [548328968208] [ConnectTCPSocket:L244] : Error resolving hostname: -5
[ERROR] Thu Jan 25 18:57:38 2024
:483 [OpenSSL Wrapper] [548328968208] [PerformSSLConnect:L408] : TCP Connection error
[INFO] Thu Jan 25 18:57:38 2024
:518 [OpenSSL Wrapper] [548328968208] [ConnectTCPSocket:L271] : resolved <our_custom_domain> to xxx.xxx.xxx.xxx
[INFO] Thu Jan 25 18:57:38 2024
:546 [OpenSSL Wrapper] [548328968208] [AttemptConnect:L336] : SSL Error Code: 2
[INFO] Thu Jan 25 18:57:38 2024
:601 [OpenSSL Wrapper] [548328968208] [AttemptConnect:L336] : SSL Error Code: 2
[INFO] Thu Jan 25 18:57:38 2024
:606 [OpenSSL Wrapper] [548328968208] [AttemptConnect:L336] : SSL Error Code: 1
[INFO] Thu Jan 25 18:57:38 2024
:607 [OpenSSL Wrapper] [548328968208] [PerformSSLConnect:L424] :  AttemptConnect Code: -405
[ERROR] Thu Jan 25 18:57:38 2024
:607 [OpenSSL Wrapper] [548328968208] [PerformSSLConnect:L426] :  Server Certificate Verification failed.

Summary

Sorry if I didn't make this concise enough but at this point seemingly we have run out of what to try and we are hammering a wall right now... All points to be an SDK V1 specific issue that may be lack of support of custom domain (which would be surprising if it is)?

Could you suggest anything else to check or try?

Expected Behavior

After switching the endpoint in the config file from ATS to custom domain, it should connect to IoT Core data plane with no issue

Current Behavior

SDK gives Server Certificate Verification failed. which seems to be related to SSL

Reproduction Steps

  1. Configure a custom domain on IoT Core settings with SSL cert from ACM
  2. Connect VPC endpoint to iot:data plane
  3. Put an Network Load Balancer across each AZ in the VPC
  4. Use a global accelerator in front of the NLB to get a static IP
  5. Link the custom domain (same as configured in the IoT Core settings) to the global accelerator using A record in the DNS.
  6. (optional) Put the same domain in the PubSub example in the V2 SDK, it connects and PubSub as expected.
  7. (optional) Put the ATS endpoint in the config file and run the V1 PubSub, it connects and PubSub as expected.
  8. Put the custom domain in the config file endpoint property and run the PubSub example, and it gets Server Certificate Verification failed.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

Compiled from the latest master branch

Environment details (OS name and version, etc.)

Ubuntu 18.04, OpenSSL 1.1.0g

yixiangding commented 9 months ago

Can you please provide any insights on this one?

I'm sure many existing/current customers would need the IoT custom domain now or in the future. Making the cpp SDK work with the AWS feature would be critical for any scaling business not just us.

yixiangding commented 8 months ago

@jmklix I noticed last week you self-assigned this issue and would appreciate it if you and the team can provide some insights when you got a chance to take a look. Even just what else to try would be great! Thanks!

mfahadm8 commented 1 month ago

Did you figure out the cause @yixiangding I am using custom CA authority registered with AWS IoT core using single account mode. I have created the custom domain with ACM generated certificate. Like yourself, I am able to test things fine with AWS Default DATA-ATS, however as soon as I test it with custom domain, it throws following error:

mosquitto_pub -h my.custom.domain.com -p 8883 -t "foo/bar" -m "Hello" --cafile fleet-provisioning/claim-certs/certificate-authority/AmazonRootCA1.pem --cert fleet-provisioning/claim-certs/staging/claimCertAndCACert.pem.crt --key fleet-provisioning/claim-certs/staging/claim.private.pem.key -d

Client (null) sending CONNECT Error: host name verification failed. OpenSSL Error[0]: error:0A00008