esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.05k stars 13.33k forks source link

ClientContext: _delaying flag has created connect timeout bug #6479

Closed johnm545 closed 3 years ago

johnm545 commented 5 years ago

Basic Infos

Platform

Settings in IDE

Problem Description

This one has been introduced by #6454 and had me scratching my head for a couple of hours

Connections are timing out after about 250 ms rather than the default 5000 ms. You won't notice it on nearby servers that response fast. For some reason test.mosquitto.org is slow for me, and this bug pops up. To reproduce you might need to hunt around for a server with slow ping. But it's clear what is happening from the logs below.

It happens because the ClientContext _write_some_from_cb() method seems to be called on a timer every 500ms (even when there's nothing to send). It then clears the delaying flag which causes the loop in ClientContext connect to timeout prematurely. Before #6454 we had seperate flags for _connect_pending and _send_waiting, and _write_some_from_cb() would only clear the _send_waiting flag.

Looks like the seperate flags have to come back.

I have added some extra debug prints to track it down, as per the snippets below

    void _write_some_from_cb()
    {
        Serial.println(" hello from _write_some_from_cb");
        if (_delaying) {
            _delaying = false;
            esp_schedule(); // break current delay()
        }
    }

loop in connect method, line 134-138

        for (decltype(_timeout_ms) i = 0; _delaying && i < _timeout_ms; i++) {
               // Give scheduled functions a chance to run (e.g. Ethernet uses recurrent)
               Serial.println(i);  // extra debug print 
               delay(1);
        }

MCVE Sketch


#include <ESP8266WiFi.h>

WiFiClient wifiClient;

void setup() {

  Serial.begin(115200);
  WiFi.begin();

  while (WiFi.status() != WL_CONNECTED) {
    Serial.print(".");
    delay(1000);
  }

  Serial.println("\nattempting socket connection...");

  if (wifiClient.connect("test.mosquitto.org", 1883)){
      Serial.println("connected to server");
  } else {
    Serial.println("connection failed");
  }
}

void loop() {
  yield();
}

Debug Messages

23:33:34.901 -> ⸮g'd`g{⸮⸮⸮gc⸮d;⸮⸮'c⸮$⸮r⸮⸮⸮⸮x⸮d⸮l⸮⸮n⸮
23:33:34.995 -> SDK:2.2.2-dev(38a443e)/Core:2.5.2-134-gc5b96b6e=20502134/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-8-g7958710/BearSSL:89454af
23:33:34.995 -> scandone
23:33:34.995 -> .wifi evt: 2
23:33:35.089 -> scandone
23:33:35.089 -> state: 0 -> 2 (b0)
23:33:35.089 -> state: 2 -> 3 (0)
23:33:35.089 -> state: 3 -> 5 (10)
23:33:35.089 -> add 0
23:33:35.089 -> aid 4
23:33:35.089 -> cnt 
23:33:35.182 -> 
23:33:35.182 -> connected with (..)
23:33:35.182 -> dhcp client start...
23:33:35.182 -> wifi evt: 0
23:33:35.979 -> ..ip:192.168.20.15,mask:255.255.255.0,gw:192.168.20.1
23:33:37.292 -> wifi evt: 3
23:33:37.995 -> 
23:33:37.995 -> attempting socket connection...
23:33:37.995 -> [hostByName] request IP for: test.mosquitto.org
23:33:38.276 -> [hostByName] Host: test.mosquitto.org IP: 5.196.95.208
23:33:38.276 -> :ref 1
23:33:38.276 -> 0
23:33:38.276 -> 1
23:33:38.276 -> 2
23:33:38.276 -> 3
23:33:38.276 -> 4
23:33:38.276 -> 5
23:33:38.276 -> 6
23:33:38.276 -> 7
23:33:38.276 -> 8
23:33:38.276 -> 9
23:33:38.276 -> 10
23:33:38.276 -> 11
23:33:38.276 -> 12
23:33:38.276 -> 13
23:33:38.276 -> 14
23:33:38.276 -> 15
23:33:38.276 -> 16
23:33:38.276 -> 17
23:33:38.276 -> 18
23:33:38.276 -> 19
23:33:38.276 -> 20
23:33:38.276 -> 21
23:33:38.276 -> 22
23:33:38.276 -> 23
23:33:38.276 -> 24
23:33:38.276 -> 25
23:33:38.276 -> 26
23:33:38.276 -> 27
23:33:38.276 -> 28
23:33:38.276 -> 29
23:33:38.276 -> 30
23:33:38.276 -> 31
23:33:38.276 -> 32
23:33:38.276 -> 33
23:33:38.276 -> 34
23:33:38.354 -> 35
23:33:38.354 -> 36
23:33:38.354 -> 37
23:33:38.354 -> 38
23:33:38.354 -> 39
23:33:38.354 -> 40
23:33:38.354 -> 41
23:33:38.354 -> 42
23:33:38.354 -> 43
23:33:38.354 -> 44
23:33:38.354 -> 45
23:33:38.354 -> 46
23:33:38.354 -> 47
23:33:38.354 -> 48
23:33:38.354 -> 49
23:33:38.354 -> 50
23:33:38.354 -> 51
23:33:38.354 -> 52
23:33:38.354 -> 53
23:33:38.354 -> 54
23:33:38.354 -> 55
23:33:38.354 -> 56
23:33:38.354 -> 57
23:33:38.354 -> 58
23:33:38.354 -> 59
23:33:38.354 -> 60
23:33:38.354 -> 61
23:33:38.354 -> 62
23:33:38.354 -> 63
23:33:38.354 -> 64
23:33:38.354 -> 65
23:33:38.354 -> 66
23:33:38.354 -> 67
23:33:38.354 -> 68
23:33:38.354 -> 69
23:33:38.354 -> 70
23:33:38.354 -> 71
23:33:38.354 -> 72
23:33:38.354 -> 73
23:33:38.354 -> 74
23:33:38.354 -> 75
23:33:38.354 -> 76
23:33:38.354 -> 77
23:33:38.354 -> 78
23:33:38.354 -> 79
23:33:38.354 -> 80
23:33:38.354 -> 81
23:33:38.354 -> 82
23:33:38.354 -> 83
23:33:38.354 -> 84
23:33:38.354 -> 85
23:33:38.354 -> 86
23:33:38.354 -> 87
23:33:38.354 -> 88
23:33:38.354 -> 89
23:33:38.354 -> 90
23:33:38.354 -> 91
23:33:38.354 -> 92
23:33:38.354 -> 93
23:33:38.354 -> 94
23:33:38.354 -> 95
23:33:38.354 -> 96
23:33:38.354 -> 97
23:33:38.354 -> 98
23:33:38.354 -> 99
23:33:38.354 -> 100
23:33:38.354 -> 101
23:33:38.354 -> 102
23:33:38.354 -> 103
23:33:38.354 -> 104
23:33:38.354 -> 105
23:33:38.354 -> 106
23:33:38.354 -> 107
23:33:38.354 -> 108
23:33:38.354 -> 109
23:33:38.354 -> 110
23:33:38.354 -> 111
23:33:38.354 -> 112
23:33:38.354 -> 113
23:33:38.354 -> 114
23:33:38.354 -> 115
23:33:38.354 -> 116
23:33:38.354 -> 117
23:33:38.354 -> 118
23:33:38.385 -> 119
23:33:38.385 -> 120
23:33:38.385 -> 121
23:33:38.385 -> 122
23:33:38.385 -> 123
23:33:38.385 -> 124
23:33:38.385 -> 125
23:33:38.385 -> 126
23:33:38.385 -> 127
23:33:38.385 -> 128
23:33:38.385 -> 129
23:33:38.385 -> 130
23:33:38.385 -> 131
23:33:38.385 -> 132
23:33:38.385 -> 133
23:33:38.385 -> 134
23:33:38.385 -> 135
23:33:38.385 -> 136
23:33:38.385 -> 137
23:33:38.385 -> 138
23:33:38.385 -> 139
23:33:38.385 -> 140
23:33:38.385 -> 141
23:33:38.385 -> 142
23:33:38.385 -> 143
23:33:38.385 -> 144
23:33:38.385 -> 145
23:33:38.385 -> 146
23:33:38.385 -> 147
23:33:38.385 -> 148
23:33:38.385 -> 149
23:33:38.385 -> 150
23:33:38.417 -> 151
23:33:38.417 -> 152
23:33:38.417 -> 153
23:33:38.417 -> 154
23:33:38.417 -> 155
23:33:38.417 -> 156
23:33:38.417 -> 157
23:33:38.417 -> 158
23:33:38.417 -> 159
23:33:38.417 -> 160
23:33:38.417 -> 161
23:33:38.417 -> 162
23:33:38.417 -> 163
23:33:38.417 -> 164
23:33:38.417 -> 165
23:33:38.417 -> 166
23:33:38.417 -> 167
23:33:38.417 -> 168
23:33:38.417 -> 169
23:33:38.463 -> 170
23:33:38.463 -> 171
23:33:38.463 -> 172
23:33:38.463 -> 173
23:33:38.463 -> 174
23:33:38.463 -> 175
23:33:38.463 -> 176
23:33:38.463 -> 177
23:33:38.463 -> 178
23:33:38.463 -> 179
23:33:38.463 -> 180
23:33:38.463 -> 181
23:33:38.463 -> 182
23:33:38.463 -> 183
23:33:38.463 -> 184
23:33:38.463 -> 185
23:33:38.463 -> 186
23:33:38.463 -> 187
23:33:38.463 -> 188
23:33:38.463 -> 189
23:33:38.463 -> 190
23:33:38.463 -> 191
23:33:38.463 -> 192
23:33:38.463 -> 193
23:33:38.463 -> 194
23:33:38.463 -> 195
23:33:38.463 -> 196
23:33:38.463 -> 197
23:33:38.463 -> 198
23:33:38.463 -> 199
23:33:38.463 -> 200
23:33:38.463 -> 201
23:33:38.463 -> 202
23:33:38.463 -> 203
23:33:38.463 -> 204
23:33:38.463 -> 205
23:33:38.463 -> 206
23:33:38.463 -> 207
23:33:38.463 -> 208
23:33:38.463 -> 209
23:33:38.463 -> 210
23:33:38.463 -> 211
23:33:38.463 -> 212
23:33:38.463 -> 213
23:33:38.463 -> 214
23:33:38.510 -> 215
23:33:38.510 -> 216
23:33:38.510 -> 217
23:33:38.510 -> 218
23:33:38.510 -> 219
23:33:38.510 -> 220
23:33:38.510 -> 221
23:33:38.510 -> 222
23:33:38.510 -> 223
23:33:38.510 -> 224
23:33:38.510 -> 225
23:33:38.510 -> 226
23:33:38.510 -> 227
23:33:38.510 -> 228
23:33:38.510 -> 229
23:33:38.510 -> 230
23:33:38.510 -> 231
23:33:38.510 -> 232
23:33:38.510 -> 233
23:33:38.510 -> 234
23:33:38.510 -> 235
23:33:38.510 -> 236
23:33:38.510 -> 237
23:33:38.510 -> 238
23:33:38.510 -> 239
23:33:38.510 -> 240
23:33:38.510 -> 241
23:33:38.510 -> 242
23:33:38.510 -> 243
23:33:38.510 -> 244
23:33:38.510 -> 245
23:33:38.510 ->  hello from _write_some_from_cb
23:33:38.510 -> :ctmo
23:33:38.510 -> :abort
23:33:38.510 -> :ur 1
23:33:38.510 -> :del
23:33:38.510 -> connection failed
23:33:45.120 -> pm open,type:2 0
d-a-v commented 5 years ago

Thanks for MCVE.

Looks like the seperate flags have to come back.

You may be right. I'm really curious about how this can happen, but it is okay to restore two bools instead of one.

johnm545 commented 5 years ago

It is curious, i didn't suspect the _write_some_from_cb at first. Can you try adding a debug print to your _write_some_from_cb, if that is firing off every 500ms like mine does then it's clear from the code that the connection will timeout prematurely. I haven't messed around with lwIP but there's always the off chance that my build is somehow borked.

Edit: _write_some_from_cb gets called from _poll, which explains why it runs regularly. This means that the for loop in _write_from_source will also get broken when _write_some_from_cb clears the _delaying flag, but there it doesn't matter because there is a do...while(true) loop around the whole thing.

d-a-v commented 5 years ago

That is an interesting test to do. #6483 will restore previous code. Pushing this issue to v3.0.0 so we don't forget about it.

johnm545 commented 5 years ago

I've tested #6483 and can confirm it fixes the issue, thanks. Feel free to close.

d-a-v commented 5 years ago

Not closing this issue so one/I can try to understand why a single bool causes issues.

d-a-v commented 3 years ago

Closing. Better is the enemy of free time. Bug was already cleared.