aws / amazon-freertos

DEPRECATED - See README.md
https://aws.amazon.com/freertos/
MIT License
2.54k stars 1.1k forks source link

Trying OTA Updates through BLE + MQTT Proxy - OTA Job does not make it to device #2328

Closed rrivadeneira-rsc closed 4 years ago

rrivadeneira-rsc commented 4 years ago

Describe the bug I have tried to following guide to perform OTA updates using BLE + MQTT Proxy through the pass-through example app and FreeRTOS OTA Demo. However, it does not seem that the OTA Job created makes it through the device's demo queue (device is in WaitingForJob state) as expected.

One thing to note is that I'm not using the provided OTA script ran through python, as I already have my own setup. I have my code-signing profile, OTA service role and policies, and IAM user and policies. Instead, I am using the Create Job (Create OTA update job) wizard that creates it with the firmware file I provide.

Is the issue that the MQTT Proxy is not telling the device there is a job in the queue? Am I missing additional steps? Please advice.

System information

Expected behavior From this guide, I would expect the job to go through the device's demo queue and start the OTA update process.

Screenshots or console output I am seeing the job successfully created on AWS in the state of QUEUED. Additionally, I see the notify and notify-next appropriate topics ($aws/things/esp32-ble/jobs/notify and $aws/things/esp32-ble/jobs/notify-next) populate with my create job information. For instance

  "timestamp" : 1596498559,
  "jobs" : {
    "QUEUED" : [ {
      "jobId" : "AFR_OTA-test-with-esp32-ble-ble-and-mqtt",
      "queuedAt" : 1596498558,
      "lastUpdatedAt" : 1596498558,
      "executionNumber" : 1,
      "versionNumber" : 1
    } ]
  }
}

And device has started the demo, connected to the broker, but still on a WaitingForJob state, as well as it does not show any change on the received, queue, processed or dropped items

7 8454 [iot_thread] [INFO ][DEMO][84540] Successfully initialized the demo. Network type for the demo: 2
8 8454 [iot_thread] [INFO ][MQTT][84540] MQTT library successfully initialized.
9 8454 [iot_thread] [INFO ][DEMO][84540] OTA demo version 0.9.2

10 8454 [iot_thread] [INFO ][DEMO][84540] Connecting to broker...

11 8454 [iot_thread] [INFO ][DEMO][84540] MQTT demo client identifier is esp32-ble (length 9).
12 8576 [iot_thread] [INFO ][MQTT][85760] Establishing new MQTT connection.
GATT procedure initiated: notify; att_handle=47
13 8577 [iot_thread] [INFO ][MQTT][85770] (MQTT connection 0x3ffba4ac, CONNECT operation 0x3ffba354) Waiting for operation completion.
14 8675 [iot_thread] [INFO ][MQTT][86750] (MQTT connection 0x3ffba4ac, CONNECT operation 0x3ffba354) Wait complete with result SUCCESS.
15 8676 [iot_thread] [INFO ][MQTT][86750] New MQTT connection 0x3ffc27dc established.
16 8677 [iot_thread] [OTA_AgentInit_internal] OTA Task is Ready.
I (86991) ota_pal: prvPAL_GetPlatformImageState
I (87001) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (87001) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
17 8679 [OTA Agent Task] [prvExecuteHandler] Called handler. Current State [Ready] Event [Start] New state [RequestingJob] 

(...)

21 8738 [OTA Agent Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/esp32-ble/jobs/$next/get/accepted

(...)

26 8806 [OTA Agent Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/esp32-ble/jobs/notify-next

(...)

31 8828 [OTA Agent Task] [prvExecuteHandler] Called handler. Current State [RequestingJob] Event [RequestJobDocument] New state [WaitingForJob] 
32 8877 [iot_thread] [INFO ][DEMO][88770] State: RequestingJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0

(...)

1164 122077 [iot_thread] [INFO ][DEMO][1220770] State: WaitingForJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0
1165 122177 [iot_thread] [INFO ][DEMO][1221770] State: WaitingForJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0
1166 122277 [iot_thread] [INFO ][DEMO][1222770] State: WaitingForJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0
1167 122377 [iot_thread] [INFO ][DEMO][1223770] State: WaitingForJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0
1168 122477 [iot_thread] [INFO ][DEMO][1224770] State: WaitingForJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0

To reproduce Follow through this guide, but instead create the OTA Job through the Create Job wizard on AWS. In short:

  1. Configure policies, roles and permissions for OTA
  2. Create valid code-signing certificate
  3. Create thing to test on IoT Core
  4. Create Cognito user and identity pools and configure the Android FreeRTOS SDK example mobile app (on latest master commit: v1.1.0-10-ge5a23e5) with this information (as well as with the IoT policy created to connect/subscribe/publish/etc.)
  5. Configure FreeRTOS firmware demo information to enable the OTA demo, disable WiFi and enable BLE network, add code-signing certificate, etc.
  6. Connect mobile and ESP32 device, which will start the MQTT proxy, and the device will connect to the broker and start the OTA waiting queue task
  7. Follow the steps to Create FreeRTOS OTA update job for the IoT thing created / the one that is connecting to IoT Core (i.e. in my case, mine is esp32-ble like in the example)
  8. The device's OTA demo job queue does not change state or information about the queue, loop still prints -> State: WaitingForJob Received: 0 Queued: 0 Processed: 0 Dropped: 0

Code to reproduce the bug FreeRTOS OTA demo as is (with the changes depicted above to enable BLE only, get the broker to connect to, thing name to match what was created on IoT, and add the code-signed certificate) and the Android FreeRTOS SDK app with the appropriate Cognito and IoT Policy information to set up the MQTT Proxy.

Additional context Again, I am not using the start_ota.py script, because I already have the information I need to go through the FreeRTOS OTA job creation process in AWS. Let me know if there is a step I am missing to successfully test this demo/example.

Thank you!

dan4thewin commented 4 years ago

Hello, @rrivadeneira-rsc . As a first trouble-shooting step, will your OTA update succeed if you configure the ESP32 device to connect via WiFi?

rrivadeneira-rsc commented 4 years ago

Hello @rrivadeneira-rsc. I tried the OTA demo configuring my ESP32 to connect via WiFi. The few OTA Jobs I had created previously to test the proxy were taken by the ESP32 device and completed (jobs failed because I had my old thing name for the firmware loaded via OTA). I then created another OTA Job with this firmware and incremented the version - that OTA Job was successful.

Here are the steps I took to configure my setup:

  1. I created a certificate with the policy I created on the guide so I can connect to the broker via WiFi
  2. Attached that certificate to my thing "esp32-ble" (the guide specifically told me to not attach a certificate to my created thing)
  3. Enabled WiFi on both aws_demo_config.h and aws_iot_network_config.h
  4. Added my credential information on both aws_clientcredential.g and aws_clientcredential_keys.h
  5. Created OTA Job with this firmware, incremented version to 0.9.3
  6. OTA succeeds

For the sake of troubleshooting, I'll get the right thing name into the firmware that has only BLE enabled and connect via the proxy to see if the OTA succeeds under the firmware that has WiFi enabled.

rrivadeneira-rsc commented 4 years ago

I have tried what I mentioned below:

  1. I loaded the firmware configured to run the OTA demo with WiFi
  2. Created an OTA Job for the firmware configured to run BLE + MQTT proxy with the BLE demo
  3. Once the OTA load finished and the firmware booted under the one configured with BLE and waiting for a network connection, I detached the certificate from my esp32-ble thing so it is back what I was trying with my BLE + MQTT proxy example (the guide said to not attatch a certificate, as we are connecting through Cognito)
  4. I connected via the Android example app for the MQTT proxy, but once the device connected - the state did not see any jobs pending, and the OTA self-test timer ran out, rebooting the unit to its original load and failing the job.
    
    3 4022 [iot_thread] [INFO ][DEMO][40220] State: RequestingJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0
    34 4122 [iot_thread] [INFO ][DEMO][41220] State: WaitingForJob  Received: 0   Queued: 0   Processed: 0   Dropped: 0

(...)

46 5322 [iot_thread] [INFO ][DEMO][53220] State: WaitingForJob Received: 0 Queued: 0 Processed: 0 Dropped: 0 47 5422 [iot_thread] [INFO ][DEMO][54220] State: WaitingForJob Received: 0 Queued: 0 Processed: 0 Dropped: 0

48 5424 [Tmr Svc] [prvSelfTestTimer_Callback] Self test failed to complete within 16000ms ets Jun 8 2016 00:22:57


For further reference, these are the topics the device is subscribing to for the jobs and how it is connecting as:
Firmware with WiFi enabled:

8 270 [iot_thread] [INFO ][DEMO][2700] MQTT demo client identifier is esp32-ble (length 9). 20 726 [OTA Agent Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/esp32-ble/jobs/$next/get/accepted 24 735 [OTA Agent Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/esp32-ble/jobs/notify-next


Firmware with BLE enabled:

11 3597 [iot_thread] [INFO ][DEMO][35970] MQTT demo client identifier is esp32-ble (length 9). 22 3888 [OTA Agent Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/esp32-ble/jobs/$next/get/accepted 27 3951 [OTA Agent Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/esp32-ble/jobs/notify-next

yourslab commented 4 years ago

Hi @rrivadeneira-rsc. Thank you for your patience. This issue has been forwarded to the OTA team, and we are currently looking for a fix.

ravibhagavandas commented 4 years ago

@rrivadeneira-rsc

I tried to reproduce your scenario but couldn't. I tried with steps as follows:

Setup:

  1. Code signer certificate was created for ESP32 and signing profile updated in FreeRTOS OTA console.
  2. Thing name was created and attached to the certificate.
  3. Cognito user pool and identity pool was created with required policies as mentioned in this guide.
  4. The policy name and cognito credentials were configured in the app.
  5. Certificate, thing name, broker endpoint and wifi credentials were configured on the device.

OTA:

  1. Configured OTA demo to use BLE and created a OTA job with this firmware.
  2. Configured OTA demo to use WiFi, flashed and ran the firmware.
  3. OTA completed download of BLE firmware over WiFi and boot up with the new image in self test.
  4. Connected from mobile app to device using BLE pairing with numeric comparison. (press 'y' as confirm on both sides)
  5. App connected to the device and self test completed successfully.

Test was done using FreeRTOS release 202007.00- 4e8219e and Android SDK 1.2.0-acfb4bc.

Could you check if the steps you executed were same ? Also what BLE pairing is enabled between mobile app and the device ? Please note that you don't need to detach the certificates from the thing for the mqtt proxy to work.

rrivadeneira-rsc commented 4 years ago

@ravibhagavandas thank you for trying this! I appreciate the thorough test.

I see there are differences between our setups. From the guide you provided, there is this step under "To set up AWS IoT" that I also saw on the guide I was using:

  1. If you are connecting your microcontroller to the cloud through a mobile device, choose Create thing without certificate. Because the Mobile SDKs use Amazon Cognito for device authentication, you do not need to create a device certificate for demos that use Bluetooth Low Energy. If you are connecting your microcontroller to the cloud directly over Wi-Fi, choose Create certificate, choose Activate, and then download the thing's certificate, public key, and private key.

Thus, I did not configure a certificate or private key for the device's client credentials, nor I attached a certificate to the thing created on AWS for the OTA demo using BLE. What I was expecting is that when I sent the OTA Jib to my thing on AWS, is that the proxy connected would take that update and run it to my device, but it is not doing that. So, the setup and OTA I was expecting to test is the following: Setup:

  1. Code signer certificate was created for ESP32 and signing profile updated in FreeRTOS OTA console.
  2. Thing name was created, NO certificate was attached.
  3. Cognito user pool and identity pool was created with required policies as mentioned in this guide.
  4. The policy name and cognito credentials were configured in the app.
  5. Thing name and broker endpoint were configured on the device.

OTA:

  1. Configured OTA demo to use BLE, flashed and ran the firmware.
  2. Incremented the version of that same OTA demo (over BLE) above and created an OTA job with this firmware.
  3. Connected mobile app to device using BLE pairing with numeric comparison. (press 'y' as confirm on both sides)
  4. OTA Job was never seen or ran on device

Additionally, I tried the same steps on your OTA section, as I explained above, but with the difference that I detached the certificate before connecting the device via BLE, so as to replicate my setup with the OTA demo with BLE configured above. My steps to run the OTA demo over WiFi and trying to update with the OTA demo that uses BLE were the following: Setup:

  1. Certificate was created and attached to the thing name created
  2. Certificate, thing name, broker endpoint and wifi credentials were configured on the device.

OTA:

  1. Configured OTA demo to use WiFi, flashed and ran the firmware.
  2. Configured OTA demo to use BLE, incremented version and created a OTA job with this firmware.
  3. OTA completed download of BLE firmware over WiFi and boot up with the new image in self test.
  4. Detached certificate from thing name on AWS
  5. Connected from mobile app to device using BLE pairing with numeric comparison. (press 'y' as confirm on both sides)
  6. App connected to the device, but self-test was unsuccessful, as the main loop did not see any OTA jobs to process
  7. Device rebooted and reverted back to original OTA demo using WiFi, connected to the cloud and failed the job (as the self-test sees it is booting from the same version)

To test the OTA demo with WiFi, I understand that I do need to provide a certificate and private key on my device configuration, as well as attach that certificate to my thing on AWS for it to connect successfully. However, I am looking for the case where both firmware applications are configured to run the OTA demo with BLE and where, as the guide said, I do not need certificates attached to my thing created. So that when I connect my mobile app with my device, an OTA job is taken and runs as expected, completes download, and then when the new image boots up for self-test, the mobile app and device connect, and the demo will find that pending job and complete self-test. self-test succeeds and takes the new imagine I get to once again connect my mobile app with my device and complete the self-test.

Would you be able to check if my steps give you the same result or you are able to run the OTA Job successfully?

I was using FreeRTOS version: 202007.00-86-g549f7d17f and, Android SDK mobile app version: v1.1.0-10-ge5a23e5

Let me know if I should be doing something different. Thank you so much again for your time!

ravibhagavandas commented 4 years ago

@rrivadeneira-rsc

I think this could be the issue: In the blog post which you are referring to, under step 3 of AWS Cognito Configuration the policy for authenticate cognito identity is missing permissions iot:Receive which allows receiving packets via cognito in self test mode. I followed the same step from this doc, which adds that permission, so I was not getting that error.For testing, I removed the permission from my cognito authenticated identity policy and I was able to reproduce the same behavior.

Could you add the permission by following these steps and check ?

  1. Goto IAM console https://console.aws.amazon.com/iam/home
  2. Choose Roles from navigation pane
  3. Search for the role you created in step 3 of "AWS Cognito Configuration" section of the blog
  4. Edit the policy associated with the role and choose JSON
  5. Add the permission "iot:Receive" under "Action" : [ list.
  6. Sign out and relogin from the app to take the new changes.
rrivadeneira-rsc commented 4 years ago

@ravibhagavandas - I see! That makes a lot of sense. I will make sure to note that.

I have added the iot:Receive permission to my policy for my cognito identity. I flashed the OTA demo configured for BLE and issued an OTA Job with also the OTA demo configured for BLE (just a version up) - OTA Job succeeded as expected. Thank you so much for your help with this!

yourslab commented 4 years ago

Closing this issue as it has been resolved. Thank you for your patience! Please feel free to reach out again anytime.