HomeACcessoryKid / life-cycle-manager

Initial install, WiFi settings and over the air firmware upgrades for any esp-open-rtos repository on GitHub
Apache License 2.0
60 stars 11 forks source link

freeze in starting a tcp session #3

Closed HomeACcessoryKid closed 5 years ago

HomeACcessoryKid commented 5 years ago

in the typical situation, the default gateway requests confirmation of arp from our assigned address ALL the time

in the code we are referring to ota.c near line 278 of release 0.9.10 if (!local_port) { do { wc_RNG_GenerateBlock(&rng, initial_port, 2); local_port=256initial_port[0]+initial_port[1]; printf("%04x,",local_port); } while (local_port<LOCAL_PORT_START); } UDPLGP("%04x ",local_port); ret = netconn_gethostbyname(host, &target_ip); while(ret) { printf("%d",ret); vTaskDelay(delay); delay=delay<500?delay2:500; //exponential hold-off till 5 seconds ret = netconn_gethostbyname(host, &target_ip); }

typical: --- ota_get_version in UDPlogger --- ota_get_version --- ota_connect LocalPort=e847 in printf --- ota_get_version

this time:

in UDPlogger --- ota_get_version --- ota_connect LocalPort=e847 in printf --- ota_get_version

<some 5 seconds of nothing>

--- ota_connect LocalPort=9e28,5f6d,a4f8,4fa4,a218,2316,0c3a,605c,10ab,3636,cc3e,cc3e Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed Timer Start Failed Timer Stop Failed beacon timeout rm match -6-1-1-1-1-1Timer Stop Failed -1Timer Start Failed beacon timeout scandone -1Timer Stop Failed Timer Start Failed beacon timeout scandone Timer Stop Failed -1Timer Start Failed beacon timeout scandone Timer Stop Failed Timer Start Failed beacon timeout scandone Timer Stop Failed Timer Start Failed beacon timeout -1Timer Stop Failed scandone Timer Stop Failed Timer Start Failed <5 minutes nothing> -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 other variation: UDPlogger: --- ota_get_version --- ota_connect LocalPort=f5ca IP:140.82.118.4 local..OK remote..OK SSL..OK set_fd to github.com port 443.. printf: --- ota_get_version --- ota_connect LocalPort=ac54,b69b,758d,83a7,f5ca,f5ca IP:140.82.118.4 local..OK remote..OK SSL..OK set_fd to github.com port 4 <1 minute frozen> 43..failed, return [-0x1] wolfSSL_send error = -308 --- ota_get_file_ex Fatal exception (28): epc1=0x402cb12b epc2=0x00000000 epc3=0x4028e684 excvaddr=0x00000000 depc=0x00000000 excsave1=0x402a3401 Registers: a0 402a3401 a1 3fff82d0 a2 3fff82d0 a3 00000000 a4 00000000 a5 3fff830b a6 00000000 a7 6f6c6e77 a8 402c4238 a9 60000000 a10 00000000 a11 0000000a a12 402c459c a13 00000000 SAR 0000001f Stack: SP=0x3fff82d0 0x3fff82d0: 20544547 6d6f482f 63434165 6f737365 0x3fff82e0: 694b7972 696c2f64 632d6566 656c6379 0x3fff82f0: 6e616d2d 72656761 6c65722f 65736165 0x3fff8300: 6f642f73 6f6c6e77 002f6461 3fff2094 0x3fff8310: 00000190 00000001 00000001 3fff2938 0x3fff8320: 00000000 00000000 00000000 0000001b 0x3fff8330: 00000001 ffffffff 3fff295c 00000000 0x3fff8340: 00000000 00000000 3fff2938 40290114 Free Heap: 30412 _heap_start 0x3fff2560 brk 0x3fff9e98 supervisor sp 0x40000000 sp-brk 24936 bytes arena (total_size) 31032 fordblks (free_size) 5476 uordblocks (used_size) 25556
HomeACcessoryKid commented 5 years ago

I have introduced a DNS lookup already in ota_init starting from 0.9.11. This trigger the bug already in that stage, so it is not related to the ota_connect routine and a lot easier to debug. IF it passes this stage after that is never an issue in ota_connect.
Introducing a 2 seconds holdoff before calling DNS seems to suppress the error -1 but there is still an issue with error -6

next step is to use lwip_debug

to be continued

HomeACcessoryKid commented 5 years ago

the -1 issue is solved and the solution is simple

by default the lwip dns only has one entry in its work queue AND we were still working with a lookup for the NTP server.

solution:

EXTRA_CFLAGS += -DDNS_TABLE_SIZE=2

I have not been able to reproduce the -6 error yet/anymore but if you see it, please comment!!