aws / amazon-freertos

DEPRECATED - See README.md
https://aws.amazon.com/freertos/
MIT License
2.54k stars 1.1k forks source link

Problem connecting via HTTPS to server #1868

Closed TheIronNinja closed 4 years ago

TheIronNinja commented 4 years ago

I'm trying to establish an HTTPS connection to an AWS server to download some file, but I haven't been able to establish said connection.

I'm using an ESP32 and I'm running some custom code, but right now I'm trying to run the code that's in the https download sync demo. There's no MQTT connection yet, as I've seen that might be a problem. I plan doing that after this connection is over.

This is what I'm seeing when monitoring:

13 447 [iot_thread] [DEBUG] HTTP request to get the certificates from URL: https://xxxxxx.eu-west-1.amazonaws.com/TEST
14 450 [iot_thread] DNS[0x4F82]: The answer to 'xxxxxxxx.eu-west-1.amazonaws.com' (xxxxxxip) will be stored
15 450 [iot_thread] [DEBUG] Resolved IP is: xx.xx.xx.xx
16 950 [iot_thread] [DEBUG] FreeRTOS_connect status: -116
17 950 [iot_thread] [ERROR][NET][9500] Failed to establish new connection. Socket status: -116.
18 950 [iot_thread] [ERROR][HTTPS Client][9500] Failed to connect to the server at xxxxxxxxxxxxx.eu-west-1.amazonaws.com on port 80 with error: 4
19 950 [iot_thread] [ERROR][HTTPS Client][9500] Error in IotHttpsClient_Connect(). Error code 203.
20 950 [iot_thread] [ERROR][DEMO][9500] Failed to connect to the server, retrying after 3000 ms.
21 1750 [iot_thread] [DEBUG] FreeRTOS_connect status: -116
22 1750 [iot_thread] [ERROR][NET][17500] Failed to establish new connection. Socket status: -116.

I've added some debug lines to see what's going on, and I've found that there's a timeout when waiting for the socket connection. I've tried giving it some extra time but without success, so I'm not sure it's a timing problem. I also checked some of the variables that are used in IotHttpsClient_Connect(), but they all seem right to me.

Any help or direction is appreciated.

Thanks!

lundinc2 commented 4 years ago

Hello @ TheIronNinja ,

Have you checked to see if this could be a firewall issue? Have you been able to reach this HTTP endpoint on a different device on your network? I see the endpoint is an HTTPS connection, have you added it's certificate to your device's list of trusted certificates? It may be easier to debug this using a HTTP endpoint, and then switching to HTTPS when you've confirmed it is working.

It may be worth running one of the default demos we distribute to help narrow down what the issue could be.

Some other useful things to do: Set the TCP library to debug levels by changing this value to 1: https://github.com/aws/amazon-freertos/blob/e984d9463b7cb594434f07e8435869bab8787ba0/vendors/espressif/boards/esp32/aws_demos/config_files/FreeRTOSIPConfig.h#L46.

Sert the HTTPS library to to debug levels here.

Use Wireshark to capture the packets being sent by your device.

TheIronNinja commented 4 years ago

Thanks for the response

I've tried connecting via WiFi hotspot with my phone and I see the same, and I can access it properly with my web browser, so I don't think it's a firewall issue.

I'm now trying to make an HTTP call, but I'm facing a similar problem. It now establishes the socket connection, but when trying to do something with TLS it fails. This is what I'm seeing now, with some extra debug lines and also the ones you told me to activate:

19 356 [iot_thread] [DEBUG] HTTP request to get the certificates from URL: http://xxxxxxxxxxxxxxxx.mock.pstmn.io/esp32_test
20 371 [iot_thread] prvProcessDNSCache: add: 'xxxxxxxxxxxxxxx.mock.pstmn.io' @ xxxxxxxxip
21 371 [iot_thread] DNS[0x7789]: The answer to 'xxxxxxxxxxxxxxxx.mock.pstmn.io' (xxxxxxxxip) will be stored
22 371 [iot_thread] [DEBUG] Resolved IP is: xx.xx.xx.xx
23 371 [iot_thread] FreeRTOS_connect: 64188 to xxxxxxxip:80
24 371 [iot_thread] Socket 64188 -> xxxxxxxip:80 State eCLOSED->eCONNECT_SYN
25 371 [IP-task] prvSocketSetMSS: 1400 bytes for xxxxxxxxip:80
26 371 [IP-task] prvWinScaleFactor: uxRxWinSize 1 MSS 1400 Factor 0
27 371 [IP-task] Connect[xxxxxxxip:80]: next timeout 1: 3000 ms
E (4534) PKCS11: failed nvs get file size 4354 0
28 391 [IP-task] MSS change 1400 -> 1460
29 391 [IP-task] TCP: active 64188 => xxxxxxxxxip:80 set ESTAB (scaling 1)
30 391 [IP-task] Socket 64188 -> xxxxxxxxip:80 State eCONNECT_SYN->eESTABLISHED
31 391 [iot_thread] [DEBUG] FreeRTOS_connect status: 0
32 391 [iot_thread] [DEBUG] TLS_Init status: 0
33 412 [IP-task] vDHCPProcess: offer yyyyyyyip
34 412 [IP-task] vDHCPProcess: acked yyyyyyyip
35 443 [IP-task] Socket 64188 -> xxxxxxxxip:80 State eESTABLISHED->eLAST_ACK
36 443 [iot_thread] ERROR: Handshake failed with error code -29184
37 443 [iot_thread] [DEBUG] TLS_Connect status: -29184
38 443 [iot_thread] [ERROR][NET][4430] Failed to establish new connection. Socket status: -1002.
39 443 [IP-task] FreeRTOS_closesocket[64188 to xxxxxxxxip:80]: buffers 16 socks 0
40 443 [iot_thread] [ERROR][HTTPS Client][4430] Failed to connect to the server at xxxxxxxxxxxxxxx.mock.pstmn.io on port 80 with error: 4
41 443 [iot_thread] [ERROR][HTTPS Client][4430] Error in IotHttpsClient_Connect(). Error code 203.
42 443 [iot_thread] [ERROR][DEMO][4430] Failed to connect to the server, retrying after 3000 ms.
43 463 [IP-task] TCP: No active socket on port 64188 (xxxxxxxip:80)
44 743 [iot_thread] prvProcessDNSCache: look-up: 'xxxxxxxxxxxxxxxxxxxxxxxxx.mock.pstmn.io' @ xxxxxxxxxip
45 743 [iot_thread] FreeRTOS_gethostbyname: found 'xxxxxxxxxxxxxxxxxxxxxx.mock.pstmn.io' in cache: xxxxxxxxxxxip
46 743 [iot_thread] FreeRTOS_connect: 44078 to xxxxxxxxxip:80
47 743 [iot_thread] Socket 44078 -> xxxxxxxxxip:80 State eCLOSED->eCONNECT_SYN

While before the error was at line 31 with FreeRTOS_connect, now it's at 37 with TLS_Connect.

Due to the structure of my project, I think running a demo as provided will create more problems than it will solve, but I literally copy pasted the code in the https download sync demo and applied the same knowledge I've learnt with some other demos.

Is there a way to force FreeRTOS to make an HTTP request instead of an HTTPS, or does it already decide automatically from the URL? How can I enable the debug lines in the mbedTLS source code? I've tried enabling some options but I still haven't found the correct one.

dan4thewin commented 4 years ago

Hi, @TheIronNinja . There is a flag to IotHttpsClient_Connect() to do HTTP instead, according to https://docs.aws.amazon.com/freertos/latest/lib-ref/https/https_client_function_connect.html : IOT_HTTPS_IS_NON_TLS_FLAG . For debug output from mbedTLS, I believe you can define MBEDTLS_DEBUG_C and/or MBEDTLS_SSL_DEBUG_ALL in the config.h under mbedtls. Also, a packet capture from a tool like wireshark couple prove helpful.

TheIronNinja commented 4 years ago

Hello, @dan4thewin . Thanks for the input.

I managed to make the HTTP response using that flag. I can use this now as a temporary solution to check the other parts of my program, but in the end I need to make an API call to AWS, which only allows HTTPS. This also means that I can't use this to debug the endpoint itself, since I won't be giving it the certificates it demands.

The main issue I'm having is that I need the certificates to NOT be a static define in the code, they have to be downloaded at some point after the device has been deployed. This raises the odds that the problem comes from the certificates, but since my approach is Amazon's recommended approach I should be able to get this system to work.

dan4thewin commented 4 years ago

@TheIronNinja If I understand you, your device will not be manufactured with a unique certificate. I believe this use case is what AWS Fleet Provisioning addresses: https://docs.aws.amazon.com/iot/latest/developerguide/provision-wo-cert.html

lundinc2 commented 4 years ago

@TheIronNinja,

I have been working on refactoring the provisioning logic. It's a bit of a work in progress but I should have something to share soon.

I'll implement your use case and share some code that could help you. I'll post an update in a few days.

TheIronNinja commented 4 years ago

@dan4thewin that is correct, but right now the easiest way to implement that is via HTTPS calls from the device itself. If I were to use the phone that does the Wifi provisioning to do that call then I'd have to transmit the certificates via BLE. Is there an easy way to do that already implemented?

@lundinc2 great news! For now I'll keep working on my solution and see if I can get it to work, but it's nice to know there's an official solution being developed.

lundinc2 commented 4 years ago

Hello @TheIronNinja,

In the mean time I'd like to help you connect via HTTPS. I was able to connect to a custom HTTPS endpoint via the sync HTTPS demo with the following changes.

I commented this block https://github.com/aws/amazon-freertos/blob/a15d89db3afd04d6fb0e51d74965352df0a9b6db/demos/https/iot_demo_https_s3_download_sync.c#L384.

I commented this block https://github.com/aws/amazon-freertos/blob/a15d89db3afd04d6fb0e51d74965352df0a9b6db/demos/https/iot_demo_https_s3_download_sync.c#L492.

I set fileSize to the size of "Hello World!" (You will need to change this to fit your certificate.)

Note: I believe this is the step that will fix the TLS error you are seeing.

Then I created a self-signed certificate for my HTTPS server and copied it here: https://github.com/aws/amazon-freertos/blob/a15d89db3afd04d6fb0e51d74965352df0a9b6db/demos/https/iot_demo_https_s3_download_sync.c#L86.

For your use case this is were you can add the certificate used by your HTTPS server. I used a self signed certificate for ease of development.

My recommendation from there is to create a buffer on the heap to store the certificate from the HTTPS request and refactor https://github.com/aws/amazon-freertos/blob/a15d89db3afd04d6fb0e51d74965352df0a9b6db/demos/dev_mode_key_provisioning/src/aws_dev_mode_key_provisioning.c#L1225 to take a buffer instead of using the macros.

I recommend enabling mbedTLS debugging https://github.com/aws/amazon-freertos/blob/a15d89db3afd04d6fb0e51d74965352df0a9b6db/libraries/3rdparty/mbedtls_config/aws_mbedtls_config.h#L2172

Is the certificate used by the HTTPS endpoint in the trusted certificate chain?

TheIronNinja commented 4 years ago

Hi @lundinc2

I managed to make an HTTP request (without S) to a fake endpoint commenting the blocks you said, but when I try to make the HTTPS call it still gets stuck with these messages:

19 344 [iot_thread] [DEBUG] HTTP request to get the certificates from URL: https://xxxxxxxxxxxx.eu-west-1.amazonaws.com/TEST
20 344 [iot_thread] Free heap: 148084
21 344 [iot_thread] Free stack: 6352
22 350 [iot_thread] prvProcessDNSCache: add: 'xxxxxxxxxxxx.eu-west-1.amazonaws.com' @ XXXXXXXXip
23 350 [iot_thread] DNS[0xC118]: The answer to 'xxxxxxxxxxxx.eu-west-1.amazonaws.com' (XXXXXXXXip) will be stored
24 350 [iot_thread] FreeRTOS_connect: 26366 to XXXXXXXXip:80
25 350 [iot_thread] Socket 26366 -> XXXXXXXXip:80 State eCLOSED->eCONNECT_SYN
26 350 [IP-task] prvSocketSetMSS: 1400 bytes for XXXXXXXXip:80
27 350 [IP-task] prvWinScaleFactor: uxRxWinSize 1 MSS 1400 Factor 0
28 350 [IP-task] Connect[XXXXXXXXip:80]: next timeout 1: 3000 ms
29 355 [IP-task] TCP: RST received from XXXXXXXXip:80 for 26366
30 355 [IP-task] Socket 26366 -> XXXXXXXXip:80 State eCONNECT_SYN->eCLOSED
31 850 [iot_thread] [DEBUG] FreeRTOS_connect status: -116
32 850 [iot_thread] [ERROR][NET][8500] Failed to establish new connection. Socket status: -116.
33 850 [IP-task] FreeRTOS_closesocket[26366 to XXXXXXXXip:80]: buffers 16 socks 0
34 850 [iot_thread] [ERROR][HTTPS Client][8500] Failed to connect to the server at xxxxxxxxxxxx.eu-west-1.amazonaws.com on port 80 with error: 4
35 850 [iot_thread] [ERROR][HTTPS Client][8500] Error in IotHttpsClient_Connect(). Error code 203.
36 850 [iot_thread] [ERROR][DEMO][8500] Failed to connect to the server, retrying after 3000 ms.

and from here forward it just reattempts to connect failing every time. As you can see, it fails when calling the following function, which returns a Timeout error:

https://github.com/aws/amazon-freertos/blob/628c0696b9953686615f1ba7570ae086adfdd238/libraries/abstractions/secure_sockets/freertos_plus_tcp/iot_secure_sockets.c#L173

I'm trying to connect to an AWS API, which is set to be open to the public so no special certificates should be needed. But I suspect this might be causing a problem. Should I use one of the default certificates from Amazon as the trusted root CA? Or any certificate might do the trick and the problem is that FreeRTOS always expects a certificate to be required?

Thanks for the help!

lundinc2 commented 4 years ago

Can you clarify which AWS API you are trying to connect to? Those certificates should do the trick.

If no server certificate is specified these certificates are used by default https://github.com/aws/amazon-freertos/blob/628c0696b9953686615f1ba7570ae086adfdd238/libraries/freertos_plus/standard/tls/src/iot_tls.c#L796.

I also see the device is trying to connect to port 80, which is generally used for HTTP endpoints. You may need to change a parameter so the device uses port 443.

TheIronNinja commented 4 years ago

Well, the problem actually was the port... Thanks a lot for that! Unfortunetly, I'm still troubling to get this to work.

The entire response that my custom AWS API gives is ~3.2kB of data, and by default the demo tries to make partial calls to get the data in chunks of 512 bytes each. What I expected to see is that it would make different requests until everything was read, but my endpoint doesn't support partial requests, so this won't work for me. I've tried to increment the response body buffer to 3.5kB so it would all fit in one request, but this causes this semaphore wait to crash:

https://github.com/aws/amazon-freertos/blob/c264e53f41b2bb629f07cfe1fcdda454e68a0af5/libraries/c_sdk/standard/https/src/iot_https_client.c#L2998

Surprisingly though, the error the device prints has little to do with it, but I've traced it manually (using log prints) to that semaphore wait. This is the log I see now:

15 348 [iot_thread] [INFO ][DEMO][3480] Requesting new certificates
16 355 [iot_thread] DNS[0x74AD]: The answer to 'xxxxxxxxxxxx.eu-west-1.amazonaws.com' (XXXXXXXip) will be stored
E (4220) PKCS11: failed nvs get file size 4354 0
17 530 [iot_thread] Free heap before: 100060
18 530 [iot_thread] Free stack before: 2588
19 530 [iot_thread] [INFO ][DEMO][5300] Now requesting Range: bytes=0-3401.
20 530 [iot_thread] [INFO ][HTTPS Client][5300] Wait for the request to finish
21 530 [iot_thread] [INFO ][HTTPS Client][5300] Wait without timeout
Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x4000c325  PS      : 0x00060830  A0      : 0x800da984  A1      : 0x3ffd9b90
A2      : 0x00000000  A3      : 0x3ffd9bc2  A4      : 0x00000003  A5      : 0x00000000
A6      : 0x0000203a  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x00fbefff
A10     : 0x00000003  A11     : 0x3ffdffec  A12     : 0x00000006  A13     : 0x00000000
A14     : 0x3ffd9b60  A15     : 0x00000004  SAR     : 0x00000004  EXCCAUSE: 0x0000001d
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

ELF file SHA256: 0ec4e5a66bd30af0265f9bfb6ebf0cf8ed287bf359262213bdef85a675608e42

Backtrace: 0x4000c325:0x3ffd9b90 0x400da981:0x3ffd9ba0 0x400db005:0x3ffda3f0 0x400db574:0x3ffdb8f0 0x400dc90d:0x3ffdb930
0x400da981: prvParseHttpsResponse at C:\Users\xxxxxxxxx\build/../components/key-provisioning/src/key-provisioning.c:565

0x400db005: requestNewCertificates at C:\Users\xxxxxxxxx\build/../components/key-provisioning/src/key-provisioning.c:565

0x400db574: HTTPKeyProvisioning at C:\Users\xxxxxxxxx\build/../components/key-provisioning/src/key-provisioning.c:1892

0x400dc90d: _threadRoutineWrapper at C:\Users\xxxxxxxxx\build/../FreeRTOS/libraries/abstractions/platform/freertos/iot_threads_freertos.c:318

The two last prints before the crash are mine and I added them before the wait linked before. The stack and heap logs are printing the state before entering in this loop:

https://github.com/aws/amazon-freertos/blob/a15d89db3afd04d6fb0e51d74965352df0a9b6db/demos/https/iot_demo_https_s3_download_sync.c#L411

Note that, in the traceback, there are two different functions which are supposed to be in the same line of code, but none of those two are. This might be a hint of what's going on, but I'm not quite sure.

Another thing I've noticed is the EXCVADDR: 0x00000000. From the Espressif documentation, this might mean that the pointer it's trying to read is NULL. But a couple lines before, where the semaphore is created, there's a check in place already, and I'm not seeing the error. To check if that was really the case, I've put a print to see if the value I am passing is NULL, and it's not. In fact, I added a IotSemaphore_Post followed by a IotSemaphore_Wait right before the wait that's crashing, and those two work perfectly with the same semaphore. But at the second wait it crashes.

Any idea on what might be happening here? Am I using too much memory with 3.5kB for the response body buffer?

TheIronNinja commented 4 years ago

I managed to get a workaround the problem. I lowered the priority of the main task that's doing the HTTPS request and now it works. I tried this because I thought that this might give the other task enough time to post on the semaphore before the main task got to the wait, as the problem apparently was related to waiting when the value on the semaphore is 0. I'm not sure how reliable or accurate this is, but for now this will allow me to continue.