espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.33k stars 7.2k forks source link

DPP Auth Failed on dpp-enrollee example (IDFGH-9228) #10615

Open knight-ryu12 opened 1 year ago

knight-ryu12 commented 1 year ago

Answers checklist.

IDF version.

v4.4.3

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

PowerShell

Development Kit.

ESP32-WROVER-DevkitC

Power Supply used.

USB

What is the expected behavior?

I expected it to correctly authenticate with Accesspoint.

What is the actual behavior?

It does not authenticate with accesspoint at all; Seems that it fails with WPA being D (66148) wpa: Mgmt Tx Status - 1, Cookie - 0x400e036c, and returns ESP_ERR_DPP_TX_FAILURE no matter what accesspoint it uses.

Steps to reproduce.

Code It is 1:1 to the example, expect I removed the key part. which should just work without it.

Debug Logs.

D (64538) wpa: DPP: Rx Public Action frame: action - 9
I (64538) wpa: DPP: Authentication Request from ac:3e:b1:02:80:e0
D (64538) event: running post WIFI_EVENT:20 with handler 0x400d6adc and context 0x3ffbdcfc on loop 0x3ffbc9f0
0x400d6adc: event_handler(void*, char const*, int, void*) at C:/Users/Alma/sensor-data-update/main/main.cpp:55

E (64548) wpa: curve: secp256r1

E (64558) wpa: prime len is 32

E (64558) wpa: Q_x: - hexdump(len=32):
E (64568) wpa: 2e cc 68 18 67 c2 7a 7f d2 59 20 1f 89 c5 99 28
E (64568) wpa: d2 59 3b 12 b9 07 13 4d 04 02 9a 80 0e 57 69 5c
E (64578) wpa: Q_y: - hexdump(len=32):
E (64578) wpa: 52 00 62 4e d9 33 98 b6 79 86 f6 ba 1c 20 a0 34
E (64588) wpa: 6f ec 12 e8 8d 28 56 47 7e 6b 09 00 48 04 84 29 
E (64598) wpa: d:      - hexdump(len=32):
E (64598) wpa: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E (64608) wpa: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
D (64558) event: running post WIFI_EVENT:20 with handler 0x400e0420 and context 0x3ffcf050 on loop 0x3ffbc9f0
0x400e0420: offchan_event_handler at C:/Users/Alma/esp443/esp-idf/components/wpa_supplicant/esp_supplicant/src/esp_dpp.c:449

D (64858) wpa: DPP: ECDH shared secret (M.x) - hexdump(len=32):
D (64858) wpa: f0 8c 6a ab fc 7b b8 4e 1b 6f f0 ff b7 2d a7 51
D (64858) wpa: 0a 20 b7 ac ae e7 de 04 58 42 a2 fd 44 22 44 9c 
D (64868) wpa: DPP: PRK = HKDF-Extract(<>, IKM=M.x) - hexdump(len=32):
D (64878) wpa: bc 82 9b f7 0b 3d 4d 33 4a 14 1f 11 9d d0 76 ff
D (64878) wpa: 76 f6 99 be c7 1f 2e 87 b2 8d d7 11 4f e5 ce 4f 
D (64888) wpa: DPP: k1 = HKDF-Expand(PRK, info, L) - hexdump(len=32):
D (64888) wpa: 0a e1 3c 28 f3 44 cf 2d 60 c0 f1 75 63 9d 19 4a
D (64898) wpa: 55 49 3f 69 61 4f 8a f0 68 22 7d ab 9e 0e 11 48 
D (64908) wpa: DDP: AES-SIV AD[0] - hexdump(len=6):
D (64908) wpa: 50 6f 9a 1a 01 00
D (64908) wpa: DDP: AES-SIV AD[1] - hexdump(len=145):
D (64918) wpa: 02 10 20 00 d1 29 db 70 63 65 7c e0 a6 69 1d 0e
D (64928) wpa: 7b b4 2e c4 dc b0 9b f1 70 94 43 b4 45 44 d9 d5
D (64928) wpa: c2 50 90 a7 01 10 20 00 ad 99 e9 8c 5d c3 e0 73 
D (64938) wpa: 5f 15 68 77 32 88 6c 42 64 18 88 a2 eb 22 85 52
D (64938) wpa: f3 15 f7 d4 16 d7 21 f7 03 10 40 00 2e cc 68 18 
D (64948) wpa: 67 c2 7a 7f d2 59 20 1f 89 c5 99 28 d2 59 3b 12
D (64958) wpa: b9 07 13 4d 04 02 9a 80 0e 57 69 5c 52 00 62 4e
D (64958) wpa: d9 33 98 b6 79 86 f6 ba 1c 20 a0 34 6f ec 12 e8 
D (64968) wpa: 8d 28 56 47 7e 6b 09 00 48 04 84 29 19 10 01 00
D (64968) wpa: 02
D (64978) wpa: DPP: AES-SIV ciphertext - hexdump(len=41):
D (64978) wpa: 5e 3c b5 ca a8 35 62 48 31 16 73 3f 86 1b 05 fd
D (64988) wpa: ca ee a0 d1 f2 ec c4 c2 a1 be f9 d3 f1 a5 f8 21
D (64988) wpa: 2d 76 12 f0 68 d6 36 44 b2 
D (64998) wpa: DPP: AES-SIV cleartext - hexdump(len=25):
D (64998) wpa: 05 10 10 00 0c fc 37 2f 56 14 8d 2e 3d e5 34 6b
D (65008) wpa: 74 4e 2e f9 06 10 01 00 02 
D (65008) wpa: DPP: I-nonce - hexdump(len=16):
D (65018) wpa: 0c fc 37 2f 56 14 8d 2e 3d e5 34 6b 74 4e 2e f9
D (65018) wpa: DPP: I-capabilities: 0x02
D (65028) wpa: DPP: Acting as Enrollee
D (65028) wpa: DPP: Build Authentication Response
D (65038) wpa: DPP: R-nonce - hexdump(len=16):
D (65038) wpa: f3 e7 c0 99 c3 ce 25 58 c4 e3 96 ab a0 47 3c f8
D (65288) wpa: DPP: Generating a keypair
E (65288) wpa: curve: secp256r1

E (65288) wpa: prime len is 32

E (65288) wpa: Q_x: - hexdump(len=32):
E (65298) wpa: 27 57 28 28 50 5c 1a a0 32 d2 b2 ba ac c4 24 9b
E (65298) wpa: f1 e5 87 a5 07 5c 3f 90 ce 43 6e 8a 89 eb eb 49 
E (65308) wpa: Q_y: - hexdump(len=32):
E (65308) wpa: 2b c3 e2 49 c8 08 4a 73 d1 ad f4 4d 90 54 40 6c
E (65318) wpa: 2f a1 32 50 a1 9e de 32 0a 11 c4 ed 61 9f 56 25 
E (65328) wpa: d:      - hexdump(len=32):
E (65328) wpa: c1 ec c8 73 ad 41 bf 26 37 2a a9 bf e5 9a d5 06 
E (65338) wpa: c1 f2 d3 81 0e 79 3c 8b dc d6 92 50 bd 1e 27 05
D (65588) wpa: DPP: ECDH shared secret (N.x) - hexdump(len=32):
D (65588) wpa: be 3d 4a 7a eb 32 19 3b 94 6a f7 70 66 d8 0c dc
D (65588) wpa: bc 32 0a 6f de 6f 70 4c 21 7b 6c 0f 42 2e 20 e8
D (65598) wpa: DPP: PRK = HKDF-Extract(<>, IKM=N.x) - hexdump(len=32):
D (65608) wpa: 4d 13 bb a1 f2 0a 4c 8d 51 0c 6c e1 7d cb f0 2c
D (65608) wpa: 65 b8 1e 4f 2c 67 06 46 bf b2 a5 0e e5 40 6f 27
D (65618) wpa: DPP: k2 = HKDF-Expand(PRK, info, L) - hexdump(len=32):
D (65618) wpa: ea 9f 7b 91 4c 0d d2 65 71 a5 29 9f a3 1f 05 ee
D (65628) wpa: 46 97 50 c3 18 32 1c 3d 1d 46 f7 d1 0f dd ef 76
D (65638) wpa: DPP: PRK = HKDF-Extract(<>, IKM) - hexdump(len=32):
D (65638) wpa: bb 7e e1 47 48 0f 02 d3 bc 1d 71 90 b8 cd 9d 73 
D (65648) wpa: b9 b1 3f 2a 34 cb 8a 67 f5 ae 43 ff f0 86 b0 7c
D (65648) wpa: DPP: ke = HKDF-Expand(PRK, info, L) - hexdump(len=32):
D (65658) wpa: ff d7 30 d0 7c 67 c5 a3 c3 bc da c9 f8 2a 6e 72 
D (65668) wpa: 16 76 98 2b d7 c7 1e bd 08 17 08 cb d9 9d 7a d5
D (65668) wpa: DPP: R-auth hash components
D (65678) wpa: DPP: hash component - hexdump(len=16):
D (65678) wpa: 0c fc 37 2f 56 14 8d 2e 3d e5 34 6b 74 4e 2e f9
D (65688) wpa: DPP: hash component - hexdump(len=16):
D (65688) wpa: f3 e7 c0 99 c3 ce 25 58 c4 e3 96 ab a0 47 3c f8 
D (65698) wpa: DPP: hash component - hexdump(len=32):
D (65698) wpa: 2e cc 68 18 67 c2 7a 7f d2 59 20 1f 89 c5 99 28 
D (65708) wpa: d2 59 3b 12 b9 07 13 4d 04 02 9a 80 0e 57 69 5c
D (65718) wpa: DPP: hash component - hexdump(len=32):
D (65718) wpa: 27 57 28 28 50 5c 1a a0 32 d2 b2 ba ac c4 24 9b 
D (65728) wpa: f1 e5 87 a5 07 5c 3f 90 ce 43 6e 8a 89 eb eb 49
D (65728) wpa: DPP: hash component - hexdump(len=32):
D (65738) wpa: c2 3f 6d 38 b9 ec 86 c0 75 d5 78 35 81 61 c4 1e 
D (65738) wpa: 5f 23 d3 3a a0 4d a1 fc 62 64 ce e5 ed a6 dc c3
D (65748) wpa: DPP: hash component - hexdump(len=1):
D (65758) wpa: 00
D (65758) wpa: DPP: R-auth - hexdump(len=32):
D (65758) wpa: 9c 68 7a de 49 3a fe 3d d4 e9 25 80 32 19 3b fe
D (65768) wpa: 43 2c 6c c1 ef c7 ae f2 c4 d6 eb 56 a0 e2 12 c2
D (65768) wpa: DPP: {R-auth}ke - hexdump(len=52):
D (65778) wpa: b6 59 c0 88 77 cc bf f5 4a 17 84 e6 56 44 cc f2
D (65778) wpa: 5c f9 89 67 ef 97 6e 43 94 a4 bb f4 46 2c a4 8b
D (65788) wpa: e2 4c 9c 5d 1c cd 0f 97 ae 80 d8 16 4e 3d ac 09
D (65798) wpa: 59 e9 4b 2e
D (65798) wpa: DPP: Status 0
D (65798) wpa: DPP: R-Bootstrap Key Hash
D (65808) wpa: DDP: AES-SIV AD[0] - hexdump(len=6):
D (65808) wpa: 50 6f 9a 1a 01 01
D (65808) wpa: DDP: AES-SIV AD[1] - hexdump(len=109):
D (65818) wpa: 00 10 01 00 00 02 10 20 00 d1 29 db 70 63 65 7c
D (65828) wpa: e0 a6 69 1d 0e 7b b4 2e c4 dc b0 9b f1 70 94 43
D (65828) wpa: b4 45 44 d9 d5 c2 50 90 a7 09 10 40 00 27 57 28 
D (65838) wpa: 28 50 5c 1a a0 32 d2 b2 ba ac c4 24 9b f1 e5 87
D (65838) wpa: a5 07 5c 3f 90 ce 43 6e 8a 89 eb eb 49 2b c3 e2
D (65848) wpa: 49 c8 08 4a 73 d1 ad f4 4d 90 54 40 6c 2f a1 32 
D (65858) wpa: 50 a1 9e de 32 0a 11 c4 ed 61 9f 56 25
D (65858) wpa: DPP: AES-SIV cleartext - hexdump(len=101):
D (65868) wpa: 07 10 10 00 f3 e7 c0 99 c3 ce 25 58 c4 e3 96 ab
D (65868) wpa: a0 47 3c f8 05 10 10 00 0c fc 37 2f 56 14 8d 2e
D (65878) wpa: 3d e5 34 6b 74 4e 2e f9 08 10 01 00 01 04 10 34 
D (65878) wpa: 00 b6 59 c0 88 77 cc bf f5 4a 17 84 e6 56 44 cc
D (65888) wpa: f2 5c f9 89 67 ef 97 6e 43 94 a4 bb f4 46 2c a4
D (65898) wpa: 8b e2 4c 9c 5d 1c cd 0f 97 ae 80 d8 16 4e 3d ac 
D (65898) wpa: 09 59 e9 4b 2e
D (65908) wpa: DPP: AES-SIV ciphertext - hexdump(len=117):
D (65908) wpa: 67 1e e0 44 0e 3e df b0 13 64 39 c7 17 84 df ff 
D (65918) wpa: 5d 44 24 40 0a 8e e8 55 8b e5 41 33 d2 13 75 19
D (65918) wpa: 53 0a f3 91 1b 4d 06 97 c7 d4 ac 95 29 2d 4a da
D (65928) wpa: f8 e2 1b a7 f5 69 a5 2a 26 5b f9 8f 6a cd a1 18 
D (65938) wpa: 3b 6f 8b 4f ed e4 15 f8 25 b0 c9 35 7f c6 15 bb
D (65938) wpa: d2 77 dc b9 33 9b 5b 73 5b 8c 81 8e 96 45 c9 08 
D (65948) wpa: d6 39 7c 20 90 4c 70 14 06 87 41 e0 96 3d 9e b7
D (65948) wpa: 5a 9e b1 4e 7e
D (65958) wpa: DPP: Authentication Response frame attributes - hexdump(len=238):
D (65958) wpa: 04 09 50 6f 9a 1a 01 01 00 10 01 00 00 02 10 20
D (65968) wpa: 00 d1 29 db 70 63 65 7c e0 a6 69 1d 0e 7b b4 2e 
D (65978) wpa: c4 dc b0 9b f1 70 94 43 b4 45 44 d9 d5 c2 50 90
D (65978) wpa: a7 09 10 40 00 27 57 28 28 50 5c 1a a0 32 d2 b2
D (65988) wpa: ba ac c4 24 9b f1 e5 87 a5 07 5c 3f 90 ce 43 6e
D (65988) wpa: 8a 89 eb eb 49 2b c3 e2 49 c8 08 4a 73 d1 ad f4
D (65998) wpa: 4d 90 54 40 6c 2f a1 32 50 a1 9e de 32 0a 11 c4
D (66008) wpa: ed 61 9f 56 25 04 10 75 00 67 1e e0 44 0e 3e df
D (66008) wpa: b0 13 64 39 c7 17 84 df ff 5d 44 24 40 0a 8e e8
D (66018) wpa: 55 8b e5 41 33 d2 13 75 19 53 0a f3 91 1b 4d 06
D (66018) wpa: 97 c7 d4 ac 95 29 2d 4a da f8 e2 1b a7 f5 69 a5
D (66028) wpa: 2a 26 5b f9 8f 6a cd a1 18 3b 6f 8b 4f ed e4 15
D (66038) wpa: f8 25 b0 c9 35 7f c6 15 bb d2 77 dc b9 33 9b 5b
D (66038) wpa: 73 5b 8c 81 8e 96 45 c9 08 d6 39 7c 20 90 4c 70
D (66048) wpa: 14 06 87 41 e0 96 3d 9e b7 5a 9e b1 4e 7e
D (66048) wpa: DPP: Mgmt Tx - MAC:ac:3e:b1:02:80:e0, Channel-6, WaitT-500        
D (66138) event: running post WIFI_EVENT:19 with handler 0x400d6adc and context 0x3ffbdcfc on loop 0x3ffbc9f0
0x400d6adc: event_handler(void*, char const*, int, void*) at C:/Users/Alma/sensor-data-update/main/main.cpp:55

D (66138) event: running post WIFI_EVENT:19 with handler 0x400e0420 and context 0x3ffcf020 on loop 0x3ffbc9f0
0x400e0420: offchan_event_handler at C:/Users/Alma/esp443/esp-idf/components/wpa_supplicant/esp_supplicant/src/esp_dpp.c:449

D (66148) wpa: Mgmt Tx Status - 1, Cookie - 0x400e036c
0x400e036c: esp_supp_rx_action at C:/Users/Alma/esp443/esp-idf/components/wpa_supplicant/esp_supplicant/src/esp_dpp.c:413

I (66148) wifi dpp-enrollee: DPP Auth failed (Reason: ESP_ERR_DPP_TX_FAILURE), retry...
D (66658) event: running post WIFI_EVENT:20 with handler 0x400d6adc and context 0x3ffbdcfc on loop 0x3ffbc9f0
0x400d6adc: event_handler(void*, char const*, int, void*) at C:/Users/Alma/sensor-data-update/main/main.cpp:55

D (66658) event: running post WIFI_EVENT:20 with handler 0x400e0420 and context 0x3ffcf050 on loop 0x3ffbc9f0
0x400e0420: offchan_event_handler at C:/Users/Alma/esp443/esp-idf/components/wpa_supplicant/esp_supplicant/src/esp_dpp.c:449

More Information.

Seems afterwards it crashes Wi-Fi module.

knight-ryu12 commented 1 year ago

Update on this: if, LWIP Enable reassembly incoming fragmented IP4 packets are enabled, above example will not work. if disabled, it suddenly work now.

Hmm.

jasta commented 1 year ago

I'm having the same issue (dpp-enrollee yields ESP_ERR_DPP_TX_FAILURE). For me it is strangely persistent but unpredictable. That is, I've had it work correctly in the past, but somehow once a given device gets into this state it will yield this error over and over no matter what I do, even if I reset the flash. I believe @knight-ryu12 is describing something similar where a seemingly unrelated change "fixed it". Looks like others have also experienced this: https://esp32.com/viewtopic.php?t=28573.

Needless to say this unpredictability makes it impossible to actually ship something that uses DPP, so we need to find some predictable way of detecting and working around this condition.

For reference, my ESP32-C3-DevKit-M1U is working, my ESP32-C3-DevKit-M1 isn't. Both are using v4.4.4.

kapilkedawat commented 1 year ago

Hi @jasta , we are trying to reproduce it locally. Are you using the example?

Please share your sdkconfig as well as IDF version commit.

jasta commented 1 year ago

Yes, I'm able to reproduce consistently with the unmodified example on the current v4.4.4 tag of esp-idf. I am using the default settings (channel 6, no provided key/device info). Here's the sdkconfig: https://gist.github.com/42ef0f07990ca812bba8b541685ef798

jasta commented 1 year ago

The bug really smells like a race condition IMO. I had it all working perfectly and changed nothing of significance about my app and it just started giving me ESP_ERR_DPP_TX_FAILURE over and over. Then no amount of reverting my code could fix it, including reverting all the way back to the unmodified example which is where I started. It of course used to work with the sample and even my considerably more robust full app. I suspect that the "condition" that changes to cause it to become persistently broken is literally in the air -- something about my Wi-Fi setup must be able to consistently reproduce an "unlikely" race condition outcome.

No hardware has been deliberately modified or replaced since it was once working, so the only possibilities in my view are environmental (wireless signals themselves changing) or through automated software updates of either my router (Google WiFi infrastructure) or my phone (Android Pixel 6)

jasta commented 1 year ago

Well that's interesting, even though I get ESP_ERR_DPP_TX_FAILURE pretty much every time, I now just got ESP_ERR_DPP_INVALID_ATTR. Adds quite a bit of evidence to my theory that this is a race :)

I swear it is seeming like the difference between the two errors is whether I have the esp32 on my desk (DPP_INVALID_ATTR) or in my component drawer (DPP_TX_FAILURE). Bizarre :)

jasta commented 1 year ago

As an aside, I am the one working on adding DPP support to Rust and ran into this issue maturing the implementation even though I didn't change anything functionally interesting with respect to esp_* calls: https://github.com/esp-rs/esp-idf-svc/pull/228 .

jasta commented 1 year ago

I have logs with the full debugging turned up but I'm not posting here as I believe they will contain my Wi-Fi creds. Lmk and I can share them privately or reproduce with a dummy network (though I suspect changing my network around will change the results)

kapilkedawat commented 1 year ago

I have logs with the full debugging turned up but I'm not posting here as I believe they will contain my Wi-Fi creds. Lmk and I can share them privately or reproduce with a dummy network (though I suspect changing my network around will change the results)

Please black out wifi cred from the logs and share the rest.

jasta commented 1 year ago

I can't actually tell what is and isn't sensitive about this, but I'm interested enough in getting this solved that I'll risk it hehe: https://gist.github.com/09fe320e7b549967b37088170e59c5cb. Here is the updated sdkconfig after I enabled logging: https://gist.github.com/8ab02d9c9ca064861e9e1cdf22261545

I confirmed again this morning it still repros. I seemingly have a 100% reliable repro (it's happened at least the last 20 or 30 times I've tried) on a devkit module that once worked just fine. One maybe important detail though is that in the example I'm unable to scan the QR code in the console (my phone never recognizes it at as a QR code), so I'm copying the QR text into qtqr and generating one there. I can confirm that if I fudge the QR text I get a different error ("No matching own bootstrapping key found as responder - ignoring message"), and then it yields ESP_ERR_DPP_INVALID_ATTR (different than the INVALID_ATTR case I got when I was randomly moving the device around physically).

knight-ryu12 commented 1 year ago

I don't know for sure, but it crashes Wi-Fi event handler hardly afterwards. I cannot receive any events after DPP had failed. I assume that might have to do with how LWIP is handling packets? (Or is it just my condition being bad...)

jasta commented 1 year ago

I don't know for sure, but it crashes Wi-Fi event handler hardly afterwards. I cannot receive any events after DPP had failed. I assume that might have to do with how LWIP is handling packets? (Or is it just my condition being bad...)

I see the same thing re the hard crash, but I don't believe it is related to LWIP packet fragmentation. These aren't even IP packets AFAICT, they're action frame packets in 802.11. Further, I'm seeing essentially identical behavior to you with the unmodified example, which I'm guessing is the same thing our friends at Espressif are testing with but seeing different results. Highly likely an environmental condition causing the difference.

gayafhannah commented 1 year ago

From my testing today, I've found that if the channel the esp is currently listening on happens to match whichever channel my phone was connected to AP on, the chance of ESP_ERR_DPP_TX_FAILURE dramatically drops and would happen rarely. This would explain the issue being intermittent and also not fixing itself for a while, since AP's dont really change their channels too often, if at all. However that doesn't get us very far since it means we need to somehow know the channel before we know what AP or anything. Which also doesn't make sense since the phone already sends the first frame to the esp on the correct channel to begin with (channel 6 in the example), and then attempts to send it back to the phone on that same channel. From what I can tell, the issue comes from the call esp_wifi_action_tx_req on line 96 inside esp_send_action_frame Although the error isn't immediately sent to the callback in the lines below and instead that callback is sent the error only when the WIFI_EVENT_ACTION_TX_STATUS wifi event is triggered on line 447 in offchan_event_handler

So from what I can tell, the issue comes down to just being purely a channel mismatch somehow and the esp not wanting to send to a different channel than its listening to(just a wild guess on that last part though).


And, it seems that there's a fix that works perfectly(from my testing atleast), all it takes is enabling Multi Band Support in your sdkconfig. So essentially all you have to do is:

CONFIG_WPA_11KV_SUPPORT=y
CONFIG_WPA_SCAN_CACHE=y
CONFIG_WPA_MBO_SUPPORT=y
CONFIG_WPA_DPP_SUPPORT=y

Instead of only doing CONFIG_WPA_DPP_SUPPORT=y (11KV_SUPPORT and SCAN_CACHE are both required by MBO_SUPPORT)

This fix seems to work well enough that I can even have the esp listening only on channel 10, my phone can be sitting on channel 149, and I can get the esp to connect to a network on channel 1 with no issue I have no idea if this is meant to fix it, or if it's going to break somewhere down the like, but nothing I can do can cause the ESP_ERR_DPP_TX_FAILURE as long as I have MBO Support enabled

gayafhannah commented 1 year ago

I don't know for sure, but it crashes Wi-Fi event handler hardly afterwards. I cannot receive any events after DPP had failed. I assume that might have to do with how LWIP is handling packets? (Or is it just my condition being bad...)

I see the same thing re the hard crash, but I don't believe it is related to LWIP packet fragmentation. These aren't even IP packets AFAICT, they're action frame packets in 802.11. Further, I'm seeing essentially identical behavior to you with the unmodified example, which I'm guessing is the same thing our friends at Espressif are testing with but seeing different results. Highly likely an environmental condition causing the difference.

Afaik it doesnt actually crash anything, it just sets s_dpp_stop_listening to true after getting the first packet, which stops it from sending more ROC events recursively, with the intention of setting s_dpp_stop_listening to false upon a successful dpp configure, and then sending a ROC event to start the recursive loop of them using esp_supp_dpp_start_listen()

From my testing, the wifi driver and event loop are all still fully functioning even when that error happens, its just there's no events actually going on to show up in the log but if you set s_dpp_stop_listening to false and call the start_listen function again, or if you manually post events, they'll show up normally

Also I doubt lwip actually has anything to do with the issue since i dont remember any mention of lwip in esp_dpp.c, and iirc lwip is a tcp/ip stack, and during dpp configuration it wouldnt make sense for there to be a tcp/ip stack considering only a limited amount of raw frames are sent between the two devices anyways. Changing lwip settings likely only fixes things due to how inconsistent the ESP_ERR_DPP_TX_FAILURE error is without CONFIG_WPA_MBO_SUPPORT=y since i know atleast for me changing things like that would fix it for exactly the first time i ran it after flashing and no more(even if i reset the nvs once the error reappears)

jasta commented 1 year ago

Afaik it doesnt actually crash anything, it just sets s_dpp_stop_listening to true after getting the first packet, which stops it from sending more ROC events recursively, with the intention of setting s_dpp_stop_listening to false upon a successful dpp configure, and then sending a ROC event to start the recursive loop of them using esp_supp_dpp_start_listen()

Ahh good catch, however what I'm seeing is that esp_supp_dpp_start_listen() does not actually once again start listening. That is, if you hit Retry in the Android Wi-Fi UI it will fail without printing any more debug output on the esp32. Looking deeper from your analysis, it looks like the reason is because esp_supp_dpp_start_listen() doesn't clear the s_dpp_stop_listening flag so even if you call esp_supp_dpp_start_listen() again the call to offchan_event_handler trying to trigger SIG_DPP_LISTEN_NEXT_CHANNEL will be silently ignored (because s_dpp_stop_listening is still true). The code is subtle in that no matter what it'll get one SIG_DPP_LISTEN_NEXT_CHANNEL event handled after esp_supp_dpp_start_listen(), but the timeout is 500ms and it will not try again even on the same channel.

@kapilkedawat, so at the very least we've identified one clear bug that needs fixing: esp_supp_dpp_start_listen() needs to call s_dpp_stop_listening = false. I've tested this locally and confirmed that it indeed lets us try again any number of times using the example code.

@gayafhannah, I'm going to go try your MBO workaround now and report back, awesome sleuthing BTW!

gayafhannah commented 1 year ago

@jasta I've also just noticed that since i'm not testing on the master branch and am instead on, in the master branch it seems to be called ESP_WIFI_MBO_SUPPORT in Kconfig, but WPA_MBO_SUPPORT in 5.0.0 I'll have to try grab myself the master branch and see how that goes

jasta commented 1 year ago

@gayafhannah, MBO support with v4.4.4 at least didn't seem to affect the results, though I still think you're onto something with your analysis. My network set-up is that my phone is connected to a Google Wi-Fi network on 5GHz ch 149, but the 2.4GHz network is on channel 1. I've tried a bunch of configurations of different channels for DPP to use though and still can't get it to work. I'm a bit stumped still.

Note: I used idf.py menuconfig to enable MBO support so the settings should match my local checkout. I'm using the v4.4.4 tag in esp-idf.

gayafhannah commented 1 year ago

@gayafhannah, MBO support with v4.4.4 at least didn't seem to affect the results, though I still think you're onto something with your analysis. My network set-up is that my phone is connected to a Google Wi-Fi network on 5GHz ch 149, but the 2.4GHz network is on channel 1. I've tried a bunch of configurations of different channels for DPP to use though and still can't get it to work. I'm a bit stumped still.

Note: I used idf.py menuconfig to enable MBO support so the settings should match my local checkout. I'm using the v4.4.4 tag in esp-idf.

I know that without MBO support, i was able to get it to be more consistent at not having the error(although not 100%) if the only channel dpp was configured to listen on was the channel of the wifi network, and with my phone also connected to that network, and therefore on that same channel, which is what led me to mbo support when snooping around menuconfig.

jasta commented 1 year ago

Disabling MBO and setting channels to 1,6 worked once, but then I rebooted the device and tried one more time and it failed. Then rebooted and tried again several times until it finally worked (which I can only do without rebooting because of my patch in esp_supp_dpp_start_listen(). How bizarre.

I've attached an abridged log comparing the failed attempt vs the successful attempt (no reboot occurred, I just tried again on the phone): https://gist.github.com/ca10e95e786b79fe847c3a13b297e732

The interesting bit from the logs is that seemingly nothing is different about the failure case and successful one. Both using Channel-6, exchanging seemingly the same data with the same timings, etc.

gayafhannah commented 1 year ago

Have thrown together a pull request #10812 that should work for the latest version of espidf with the changes i made to get fully functional dpp I should note though that i haven't tried using the example in the repo and instead have been testing with my own heavily modified version, although there shouldn't be much difference in how they work in the end. Also @jasta there may be a chance that on v4.4.4 MBO support may not work quite the same, especially considering its an entire major release difference. 99% of my testing has been done on v5.0.0

jasta commented 1 year ago

Have thrown together a pull request #10812 that should work for the latest version of espidf with the changes i made to get fully functional dpp

Looks good, you might want to also include my fix to esp_supp_dsp_start_listen which resets the s_dpp_stop_listening flag. This is what enables retries to work as was intended by the example (i.e. not needing deinit/init before it can be retried).

I should note though that i haven't tried using the example in the repo and instead have been testing with my own heavily modified version, although there shouldn't be much difference in how they work in the end. Also @jasta there may be a chance that on v4.4.4 MBO support may not work quite the same, especially considering its an entire major release difference. 99% of my testing has been done on v5.0.0

Planning to retest all this on v5.0.1 but was sticking to v4.4.4 because that's where it once worked then stopped so I figured we could learn the most about what exact environmental difference actually matters.

gayafhannah commented 1 year ago

Looks good, you might want to also include my fix to esp_supp_dsp_start_listen which resets the s_dpp_stop_listening flag. This is what enables retries to work as was intended by the example (i.e. not needing deinit/init before it can be retried).

I'll have to give your fix a shot in the morning That fix was actually the first thing I tried, and while it did allow me to retry even after significant errors, it had some... strange side effects. I can't explain what caused them exactly but that could just be because I was really fucking with esp_dpp.c to figure out just where was it all going wrong and somehow because of that little fix I was getting too many ROC events every second to even bother counting. Hopefully that change works better with a cleaner and working esp_dpp.c than it did this morning.

Also my guess for environmental differences causing the issues is just the phone/AP switching channels. It's the only thing I can come up with since like I said in earlier comments, setting the listen channel to be the same as the ap and phone channel, maybe 80-90% success rate atleast. Instead of the not even 1% chance if it wasn't the same. A fun way to test it was to in the channel listen list that you put into one of the functions, just up all channels 1 through to 13, I also added a little thing in esp_dpp to show the current channel as it was cycling through. If I managed to get the timing pressing the button on my phone just right to land on channel 1 on the esp(channel 1 also what the ap and my phone were on) then it'd almost always connect fine, but if I missed that channel(difficult to get it right when the timeout is 500ms) then it'd fail to connect except for maybe exactly one time.

Also if I remember correctly, setting the channel to be near the same as the ap, if the channel width for the ap is 40MHz, I noticed even being +-2 channels from the centre channel, it had moderate success rates. Definitely not the same as setting the listen channel directly ontop of the centre channel though. Also 40MHz wide ones were slightly less successful on the centre channel than 20MHz wide(probably because 20MHz is literally just 1 channel wide). I have no explanation for this stuff to do with channel width, it's just something I noticed during testing since the extender here is 40MHz wide and the main ap is 20MHz wide and I was constantly switching networks. I feel like I'm going crazy because I have no solid evidence and it could all just be a coincidence and there was some other environmental stuff going on, but I spent alot of time testing things and that pattern of width playing a small part(and being directly ontop of the right channel being an even bigger part) have been constants the entire time. This would also likely explain why it sounds like others have been having much more inconsistent errors, since if the target ap has a wide channel and being close to a wide one increases chances but only a bit, whereas for me having a network where if I'm not right on it, it doesn't work, until I use the extender but then it works only sometimes

jasta commented 1 year ago

I'll have to give your fix a shot in the morning That fix was actually the first thing I tried, and while it did allow me to retry even after significant errors, it had some... strange side effects. I can't explain what caused them exactly but that could just be because I was really fucking with esp_dpp.c to figure out just where was it all going wrong and somehow because of that little fix I was getting too many ROC events every second to even bother counting. Hopefully that change works better with a cleaner and working esp_dpp.c than it did this morning.

I'm not observing any strange side effects from the fix FWIW. And more importantly I'm noticing that my phone will actually automatically retry now with this setting on, indicating that the DPP standard actually expects some intermittent failures (at least due to RF interference would be my guess, but maybe even this channel weirdness). That suggests to me that the lack of working retry is a big part of the root cause here. Imagine what would happen to our global TCP infrastructure if you disabled packet retransmits, for example...

Also my guess for environmental differences causing the issues is just the phone/AP switching channels. It's the only thing I can come up with since like I said in earlier comments, setting the listen channel to be the same as the ap and phone channel, maybe 80-90% success rate atleast. Instead of the not even 1% chance if it wasn't the same. A fun way to test it was to in the channel listen list that you put into one of the functions, just up all channels 1 through to 13, I also added a little thing in esp_dpp to show the current channel as it was cycling through. If I managed to get the timing pressing the button on my phone just right to land on channel 1 on the esp(channel 1 also what the ap and my phone were on) then it'd almost always connect fine, but if I missed that channel(difficult to get it right when the timeout is 500ms) then it'd fail to connect except for maybe exactly one time.

Definitely plausible and consistent with my experience. If I just sit there and hit Retry on my Android phone over and over with my fix it will eventually work. Not ideal UX by any stretch, though, and I'd be quite surprised if we (or maybe just Espressif) couldn't come up with a more reliable way to fix this.

Also if I remember correctly, setting the channel to be near the same as the ap, if the channel width for the ap is 40MHz, I noticed even being +-2 channels from the centre channel, it had moderate success rates. Definitely not the same as setting the listen channel directly ontop of the centre channel though. Also 40MHz wide ones were slightly less successful on the centre channel than 20MHz wide(probably because 20MHz is literally just 1 channel wide). I have no explanation for this stuff to do with channel width, it's just something I noticed during testing since the extender here is 40MHz wide and the main ap is 20MHz wide and I was constantly switching networks. I feel like I'm going crazy because I have no solid evidence and it could all just be a coincidence and there was some other environmental stuff going on, but I spent alot of time testing things and that pattern of width playing a small part(and being directly ontop of the right channel being an even bigger part) have been constants the entire time. This would also likely explain why it sounds like others have been having much more inconsistent errors, since if the target ap has a wide channel and being close to a wide one increases chances but only a bit, whereas for me having a network where if I'm not right on it, it doesn't work, until I use the extender but then it works only sometimes

I can't really say for sure, I'd probably need to dive deep into how this stuff is even supposed to function at the standard level before I could speculate why that would play a role. One curious thing I did notice however was that the QR code does in fact include the channel that the enrollee (the esp32) is listening on, so it doesn't make any sense why the configurator (the phone) would have any troubles sending/receiving messages on the correct channel.

gayafhannah commented 1 year ago

I can't really say for sure, I'd probably need to dive deep into how this stuff is even supposed to function at the standard level before I could speculate why that would play a role. One curious thing I did notice however was that the QR code does in fact include the channel that the enrollee (the esp32) is listening on, so it doesn't make any sense why the configurator (the phone) would have any troubles sending/receiving messages on the correct channel.

I noticed the channels being listed in the qr code aswell, and your exact reasons are why I'm hesitant that my guess is actually correct. It's just that pattern being there made it really seem like it has to have something to do with the channel being wrong. Especially because the error afaik is caused by the esp failing to send the packet "offchannel" and not in receiving any packets, as even with the error happening, it still successful receives the first packet/frame from the phone just phone. I'd love to give it a try with my phone not connected to any network while doing dpp so it's not bound to a channel(although when my phone can handle 2 wifi connections like it's nothing that's hard to believe it's an issue) except thanks to Samsung's Oneui being borked, I can barely access the dpp provisioning to begin with, and I can't select any network other than the current one for provisioning, so I can't test that.

jasta commented 1 year ago

@kapilkedawat The plot thickens trying to work around these bugs. It looks like if you try to work around this issue by calling esp_supp_dpp_deinit/init again and just starting all over, you will quickly run into a race condition in esp_dpp.c in which this assert fails:

https://gist.github.com/a067c88e511dfff52e7f704d469e157f

I am using unmodified ESP-IDF 4.4.4 to cause this behaviour.

The reason appears to be because of an inherent race condition in which esp_supp_dpp_init makes no effort to check if the previous s_dpp_evt_queue and associated task is drained and cleaned up before proceeding to start up esp_dpp_task again. Illustrated as such:

Caller Task dppT Task #1 dppT Task #2
dpp_init()
xQueueReceive (blocked)
dpp_deinit()
Handling SIG_DPP_DEL_TASK
dpp_init()
xQueueReceive (blocked)
s_dpp_evt_queue = NULL;
xQueueReceive crashes: s_dpp_evt_queue is NULL)

It seems to me like the fix would be to have a condition variable and associated state that deinit sets true, SIG_DPP_DEL_TASK sets false, and init checks for / waits until it is false. A workaround that seems reliable enough is to sleep for 1s between deinit/init though of course this isn't really a fix.

This also suggests to me the proposed fix of clearing s_dpp_stop_listening in esp_supp_dpp_start_listen is even more important.

EDIT: Filed a separate issue for this, #10879

jasta commented 1 year ago

@gayafhannah Did you ever test on the master branch? I started prepping my separate s_dpp_stop_listening fix and noticed that while v5.0.1 still has issues, the master branch is actually working correctly! I checked in wireshark on a promiscuous third party device and discovered that in fact the OTA packets are distinct between v4.4.4/v5.0.1 and master. For the former, there are several action frame transmits in response to the initial auth request from my phone but they are all ignored by my phone. The phone eventually times out and gives up, despite in theory the packets being made available. For master, there is exactly one transmit of auth response from the esp32 and the phone picks it up right away and the whole process completes successfully really fast.

The only difference I can think of here is if the wifi libs got updated in master and fixed some subtle but important bug in the action frame tx path.

Next step I was going to try setting up wpa_supplicant on my PC so I can increase the debug logs and understand why the packets I can see sent from the esp32 are not being accepted/received by the peer, but this is slow going :)

shreyassheth1 commented 1 year ago

Hey @gayafhannah @knight-ryu12 @jasta we are actively looking into this issue. For us, it is little bit hard to stimulate one consistent behavior . Also @gayafhannah regarding your PR we are trying to make more changes and make the state machine cleaner. Meanwhile, you can try out these changes

jasta commented 1 year ago

Note that I was able to make things work with https://github.com/espressif/esp-idf/pull/10865. I also dug into Wireshark a bit and saw some fairly surprising results (packets going out but not picked up by the configurator phone) which made me think this is why they changed the way the initial listening phase works in DPP V2. I think the right path to go down now is to upgrade wpa_supplicant.

ragnarmargus commented 1 year ago

Hey, I am experiencing the same issue. DPP fails to connect. Is there a workaround without problems?

jasta commented 1 year ago

Hey, I am experiencing the same issue. DPP fails to connect. Is there a workaround without problems?

Can you try with a version that has my patches applied (looks like espressif backported this to v4.3, v4.4 and v5.0 branches, but I don't think any versioned releases were made with the fix)? What you should see is that there are many automatic retries if you test with the dpp-enrollee sample and it will eventually work. In either case, can you report your findings here?

ragnarmargus commented 1 year ago

Hey, I am experiencing the same issue. DPP fails to connect. Is there a workaround without problems?

Can you try with a version that has my patches applied (looks like espressif backported this to v4.3, v4.4 and v5.0 branches, but I don't think any versioned releases were made with the fix)? What you should see is that there are many automatic retries if you test with the dpp-enrollee sample and it will eventually work. In either case, can you report your findings here?

Thanks for pointing out that you patched this. We decided to go with smartConfig instead. Maybe I will return to this in the future.

Stencht commented 1 year ago

Hmm I'm also experiencing the same issue when running the dpp-enrollee sample, which has the fix, using:

I got it properly working on the master branch (V5.2), but I had to return to V5.0 because of incompatibility of other dependencies. Now I get inconsistent results. I got it to work once, and other times I get errors like:

I (888) phy_init: phy_version 610,2bff4c8,Jul 27 2023,20:22:14
I (928) wifi:mode : sta (f4:12:fa:db:91:d8)
I (928) wifi:enable tsf
I (938) wifi dpp-enrollee: Started listening for DPP Authentication
I (16848) wifi dpp-enrollee: DPP Auth failed (Reason: ESP_ERR_DPP_FAILURE), retry...

Shouldn't it auto-retry here (at least 5 times in the example?)? If I keep on trying in the UI on my phone, it eventually works, but this isn't workable.

I also tried to set the config options:

CONFIG_WPA_11KV_SUPPORT=y
CONFIG_WPA_SCAN_CACHE=y
CONFIG_WPA_MBO_SUPPORT=y
CONFIG_WPA_DPP_SUPPORT=y

But that didn't resolve it either.

Any ideas?

Shreyas0-7 commented 1 month ago

Hey @Stencht @jasta @gayafhannah We have prepared a patch which we think will fix this issue and need your help to test it. This patch is based on v5.0.3 (e004907bd53b68e6db2e29ae335d8b38cbb39472)

michael-betz commented 1 month ago

@Shreyas0-7 I tried your patch with an ESP32 + Android 14 (Lineage OS) phone.

Half the time it works. The other half, nothing is printed on the serial console, Android times out with Couldn't add device and offers me to Retry. Clicking the Retry button always leads to another timeout. The only way to get a success in this case is to reboot the ESP32 and scan the QR code again.

Shreyas0-7 commented 1 month ago

@michael-betz Can you please set ESP into debug mode and attach ESP serial logs if possible

michael-betz commented 1 month ago

Hi, sorry for the delay.

Here's the complete debug log of dpp failing 5 times (I push the retry button on my phone 5 times)

failed_after_5_retries.txt

kapilkedawat commented 1 month ago

Hi @michael-betz, thanks for the logs. Was the mobile connected to the AP during this time, and was the AP operating on another channel? From the logs, it looks like the mobile didn't receive the frames sent by the device (it didn't wait for sufficient time). T

his issue was later realized, and DPP R2 has provisions to mitigate this type of scenario.

Could you also try setting the ESP listen channel to be the same as the AP's channel and test it again?

AxelLin commented 1 month ago

@michael-betz Any update regarding the comment in https://github.com/espressif/esp-idf/issues/10615#issuecomment-2271641093 ?

michael-betz commented 1 month ago

Okay, I gave it another shot, this time I'm testing against a FRITZ!Box 7590 on channel 6.

Previously I was testing against an Android Hotspot (running on another phone, not the one I use for scanning the QR code).

I've set EXAMPLE_DPP_LISTEN_CHANNEL_LIST "6" to match the channel of the target wifi.

The first 3 attempts failed as before.

Then I enabled WPA_DEBUG_PRINT in menuconfig and re-compiled.

Surprisingly, the next 15 attempts all succeeded. Even after changing EXAMPLE_DPP_LISTEN_CHANNEL_LIST to 12 and letting the phone connect to the wifi.

Then I disabled WPA_DEBUG_PRINT, re-compiled and only 4 out of 8 attempts succeeded.

What a strange result.

So I've just repeated this test one more time.

With WPA_DEBUG_PRINT enabled: 7 / 8 attempts succeeded

Same setup but WPA_DEBUG_PRINT disabled: 2 / 8 attempts succeeded.

Could there be a race condition somewhere?

Zmmfly commented 2 weeks ago

2024.09.06 My problem is solved. The actual problem is that the previous URI is used as the QRCode data source, which causes a mismatch.

IDF version: 5.3

Same problem, but when i read code, i found that caused by hash compare of keys: https://github.com/espressif/esp-idf/blob/0bbd72819602b4b8cf786a8833d0ee9c922268bf/components/wpa_supplicant/esp_supplicant/src/esp_dpp.c#L165-L172

The log start with DPP key pair generation until share AP secret failed from Android with QR code scan.

D (8736) wpa: DPP: Generating a keypair
E (8736) wpa: curve: secp256r1
E (8736) wpa: prime len is 32
E (8736) wpa: Q_x: - hexdump(len=32):
E (8737) wpa: da 13 d2 0d 14 a4 bc c4 4c 60 52 1b bb e2 99 d4
E (8738) wpa: 33 4f 6f 05 fa c5 84 0d a7 78 83 f7 01 dc 8e 5b
E (8738) wpa: Q_y: - hexdump(len=32):
E (8739) wpa: 23 59 c0 83 8a 32 57 a5 f1 e2 cb ca 4f 97 09 c5
E (8740) wpa: 2e 40 0b 22 24 1b b2 f1 87 80 a4 c8 37 21 d1 32
E (8740) wpa: d:      - hexdump(len=32):
E (8741) wpa: 6b e5 79 52 ce bc d4 83 2f ba 29 39 43 3c d8 94
E (8742) wpa: 59 e7 ad 8e d2 a7 ab 17 cd e9 37 57 a1 41 7a 01
D (8743) wpa: DPP: Compressed public key (DER) - hexdump(len=59):
D (8744) wpa: 30 39 30 13 06 07 2a 86 48 ce 3d 02 01 06 08 2a
D (8744) wpa: 86 48 ce 3d 03 01 07 03 22 00 02 da 13 d2 0d 14
D (8745) wpa: a4 bc c4 4c 60 52 1b bb e2 99 d4 33 4f 6f 05 fa
D (8746) wpa: c5 84 0d a7 78 83 f7 01 dc 8e 5b
D (8746) wpa: DPP: Public key hash - hexdump(len=32):
D (8747) wpa: 77 e7 21 a5 b5 a4 e4 ab 94 cc d9 cb 29 29 99 24
D (8748) wpa: a0 60 20 7a 37 6d 64 77 72 b7 ac 52 41 71 18 eb
D (8748) wpa: DPP: URI mac: 30:30:f9:5a:8a:54
D (9251) wpa: DPP: Sent event 3 to DPP task
~~ Repeat ~~
D (11258) wpa: DPP: Sent event 3 to DPP task
I (01/01/70 00:00:10.598) [  scr:wifi_sc  ] update qrcode done
D (11760) wpa: DPP: Sent event 3 to DPP task
~~ Repeat ~~
D (15772) wpa: DPP: Sent event 3 to DPP task
D (15788) wpa: DPP: Sent event 2 to DPP task
D (15788) wpa: DPP: Rx Public Action frame: action - 9
I (15789) wpa: DPP: Authentication Request from b6:c0:af:07:30:b6
V (15790) wpa: DPP: Responder Bootstrapping Key Hash - hexdump(len=32):
V (15790) wpa: 09 bb 6f cc 65 d0 90 e7 61 e5 00 d5 a7 91 8e f3
V (15791) wpa: 38 19 16 07 d0 5b a4 28 b5 a7 1b ff 1d 48 94 4a
V (15791) wpa: DPP: Initiator Bootstrapping Key Hash - hexdump(len=32):
V (15792) wpa: a9 18 43 12 b6 8b fa 58 b9 f8 3c 00 b7 6f 99 b2
V (15793) wpa: 42 a1 4c d2 da 56 d8 55 46 68 fb 6c 38 4f 92 07
I (15794) wpa: DPP: No matching own bootstrapping key found as responder - ignore message
D (15795) wpa: DPP: Sent event 3 to DPP task
E (01/01/70 00:00:14.657) [ antx:idf:wifi ] DPP auth fail, reason: ESP_ERR_DPP_INVALID_ATTR, data: 0x3099
Kaden-mpc commented 1 week ago

I have experienced issues with DPP running on IDF v5.2.1, out of 47 attempts I experienced a failure rate of 51%. 40 attempts were made with verbose logging enabled, and WPA logs disabled, and 7 attempts were made with both verbose logging and WPA logs enabled.

Logs of all failures can be found here


Then I enabled WPA_DEBUG_PRINT in menuconfig and re-compiled.

Surprisingly, the next 15 attempts all succeeded. Even after changing EXAMPLE_DPP_LISTEN_CHANNEL_LIST to 12 and letting the phone connect to the wifi.

Then I disabled WPA_DEBUG_PRINT, re-compiled and only 4 out of 8 attempts succeeded.

What a strange result.

With WPA_DEBUG_PRINT enabled: 7 / 8 attempts succeeded

Same setup but WPA_DEBUG_PRINT disabled: 2 / 8 attempts succeeded.

As @michael-betz has also experienced, changing the value of WPA_DEBUG_PRINT changes the results a lot, without any other code changed.

Aditi-Lonkar commented 1 day ago

Hi @Kaden-mpc , I am sharing a debug build for dpp_enrollee example on v5.2.1 here with some extra logs and few fixes. Will you please test with this build and share the logs and sniffer capture?

dpp_test_build_s3.zip

The file flash_cmd contains command for flashing. Please use that to flash the device. Thanks.

Kaden-mpc commented 1 day ago

@Aditi-Lonkar I have collected 20 logs of the build you shared, 8/20 failed. I have not yet collected any network packets. I will get back to you with those, and additional logs when I get them. The logs can be found here. Thank you for jumping on this issue!

Aditi-Lonkar commented 19 hours ago

@Kaden-mpc Thanks for trying the earlier build and sharing the logs. We are actively working on the issue, however we are not able to reproduce exact same results in our setup.

I am attaching one more build here with some more fixes added. Will you please test and share the logs and sniffer capture with this too?

The fixes includes 1) retrying sending auth response for 5 times on same channel in 1second interval if we didn't get auth confirm. 2) starting timer of 1second for getting auth confirm after getting ack for auth_response 3) staying on current channel until the whole auth process finishes 4) stopping listen when dpp returns failure and some reason code related changes. 5) keep rf enable until the offchannel transmission is done 6) starting a timer to fail if config not received in 120 secs

dpp_test_build_521_2.zip The file flash_cmd contains command for flashing. Please use that to flash the device.

Thanks.

ryanrampage1 commented 12 hours ago

@Aditi-Lonkar thank you for the responsiveness, @Kaden-mpc and I have the ESP logs from the new version but do not have the sniffer capture we are able to share yet due to our company data security policy. We are working on an alternative way to get the packet capture and will get that over to you early next week.

Thank you again for the help!