SmartThingsCommunity / st-device-sdk-c

SmartThings SDK for Direct Connected Devices for C
Other
118 stars 126 forks source link

Recent SDK update breaks onboarding #74

Closed toddaustin07 closed 3 years ago

toddaustin07 commented 3 years ago

Today I did a git pull to make sure I had the latest core SDK files. I re-built the core SDK and rebuilt the example app. However when trying to onboard a new example device now, an error occurs. I have retried several times and the same error occurs at the same point in the onboarding process (before the mobile app displays the list of APs):

I [IoT]: _es_keyinfo_handler(553) > master secret generation success << last successful log message E [IoT]: _es_time_set(347) > Time set error!! << first error message E [IoT]: iot_easysetup_request_handler(1784) > failed to handle step 1 (-412)


EDIT: In looking at your code, it appears you are trying to set the time of the system clock? I assume you are synching the device clock with the mobile device? In the case of a Raspberry Pi, this wouldn't be needed, but I don't know why it is causing an error. It is also suspicious that this is the exact place where onboarding was hanging up for me as reported in my earlier issue #71. It may be this clock setting is causing both issues, although I don't know why this particular error is happening now. (Perhaps there is a problem with the time value received from the mobile app?) Here is some more debugging info around the clock_settime function call which seems to be the source of the error (errno=1: operation not permitted):

time_in_sec=1611289980 ret = clock_settime(CLOCK_REALTIME, &ts); ret=-1, errno=1


Device: Raspberry Pi 3b with Raspbian Buster & latest core SDK repository Mobile: IOS app version 1.6.57-463

For further information, below is the output from git showing the new files that were pulled: pi@raspberrypi:~/st-device-sdk-c $ git pull origin master remote: Enumerating objects: 42, done. remote: Counting objects: 100% (42/42), done. remote: Compressing objects: 100% (18/18), done. remote: Total 42 (delta 25), reused 39 (delta 24), pack-reused 0 Unpacking objects: 100% (42/42), done. From https://github.com/SmartThingsCommunity/st-device-sdk-c

I [IoT]: http_cleanup_all_connection(40) > close listen socket I [IoT]: http_cleanup_all_connection(47) > close accept socket I [IoT]: es_http_deinit(187) > http tcp deinit complete! I [IoT]: iot_easysetup_deinit(524) > IOT_STATE_PROV_ES_DONE I [IoT]: iot_bsp_wifi_set_mode(836) > [rpi] Requested mode OFF I [IoT]: _SoftAPControl(1266) > [rpi] SoftAP service stop issued I [IoT]: _do_status_report(300) > Call usr status_cb with 1/0 iot_status: 1, lv: 0 I [IoT]: _do_update_timeout(196) > Current timeout : 4294967295 for -1/-1 I [IoT]: _do_status_report(291) > Unsupported state -1 for 0/0 E [IoT]: st_conn_start(1829) > Can't go to PROV_CONFIRM (0x2) I [IoT]: st_conn_start(1831) > st_conn_start done (-7)

stdk-scm commented 3 years ago

Hi @toddaustin07,

iot-core uses TLS connection which requires accurate time for the system. there could be some options which system can have accurate time.

Using (s)ntp protocol is really good option. we also have code which using it. but our experience doesn't say it is 100% reliable. because of various reasons including network performance, firewall configuration and (s)ntp server's temporal problem. that's why most system has manual time set option as secondary method.

Let's think about the IoT devices. Mostly it could be headless devices (without keyboard, mouse, touch screens and screen output) there is noway to set time by manually. That's why ST app send time information to (headless) device as a secondary time set option. I'm also not sure, why your system couldn't set time with clock_settime API - permission problem?. If your system could have accurate time by other methods, I think you can simply make iot_bsp_system_set_time_in_sec returns IOT_ERROR_NONE just skipping call clock_settime.

Thanks.

Kwang-Hui Cho

toddaustin07 commented 3 years ago

Thank you for quick reply, Kwang-Hui! It is strange that the error is occurring and I wondered about privileges as well. But I can't find any documentation of exactly what privileges are required, and I don't understand why it would all of a sudden be a problem when it was working (most of the time) before!

I think for my purposes we can assume the device (Pi) already has accurate time set. So I will create a RPI version of the IOT_BSP_SYSTEM_POSIX.c file that simply returns no error for the time set function.

Kwang-Hui commented 3 years ago

Yes, I would be good enough for your testing. Thanks.

toddaustin07 commented 3 years ago

Hi Kwang-Hui - wanted to let you know that voiding the clock_settime function did indeed eliminate the error and onboarding proceeds for now. I still don't know why I didn't have this issue previously!

How sensitive is the TLS requirement for accurate time? Does it need to be precisely synchronized with the mobile app or is there a margin of difference allowed? I want to make sure I don't cause other issues by eliminating the clock_settime.

Unfortunately even after this change I am still getting occasional hangs immediately following the master secret generation success log message (which I documented in issue #71), which appears to be somewhere following the clock setting. I've not been able to track it down to the precise function yet.

Overall, I'm only having about a 60-70% success rate onboarding device applications, sometimes requiring multiple retries. It seems to be a very error-prone process. The other day I got another legal agreement pop up in the middle of onboarding, which caused it to hang. I will note that again in my original issue on that topic.

Kwang-Hui commented 3 years ago

Hi @toddaustin07 , The time sensitivity could be changed without any notice for security reasons. but as I remember the device should have minutes level time accuracy to have successful cloud connection for now. Thanks.

Kwang-Hui commented 3 years ago

This issue is closing now. If you need to discuss further about this subject. Please open new issue again.

Thanks.