esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.91k stars 13.34k forks source link

Request Event Driven (non-blocking) WiFi socket API #922

Open drmpf opened 8 years ago

drmpf commented 8 years ago

The request is for a method call which returns true iff a call to WiFiClient.write(buffer) will NOT block. as in

if (client.canWriteBufferWithoutBlocking()) {
   client.write(buffer);  // this will NOT block
}

The low level API has a callback framework, so this could be implemented by a boolean which is cleared on a call to client.write() and set by the callback that is executed when the buffer has been sent and ACKed.

(Also as another request, can client.flush() be a noop, so incoming data not lost, but kept until read or client stopped.)

Background to this Request ========== (see https://github.com/esp8266/Arduino/issues/917)

ESP can only handle one outgoing TCP packet at a time. It needs to keep the last packet for retransmission until it is ACKed from the other side. For Windows clients the ACK delay is 0.2 secs.

While ESP is waiting for ACK, it cannot accept another client.write(buffer) So currently "client.write() blocks until either data is sent and ACKed, or timeout occurs (currently hard-coded to 5 seconds). Also client.flush() doesn't "force" sending data after client.write(), because there is no output buffer which can be flushed. It does discard incoming data though."

If the client is connected then you could expect that after a maximum of 0.2 sec blocking, the next data buffer will be written. However if the connection is lost, (i.e. no return ACK) then the sketch will block for 5 sec.

In many cases both the 0.2 sec and 5 sec blocking is too long.

In my particular case of a UART to WiFi bridge sketch, 0.2 sec at 115200 baud is about 2K of incoming serial data that is being ignored / lost while the sketch is blocked in client.write();

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

igrr commented 8 years ago

I think what you need is a non-blocking write function, similar to POSIX send, which would return EWOULDBLOCK if previous packet has not been acked yet.

drmpf commented 8 years ago

That would work, but would not be familiar to novice Arduino users. I suggest something more like how Serial is used.
Even something as simple as WiFiClient.writeAvailable() returning 0 when would block would work and return 1460 when packet was acked Actually this would give a lot more information to the average user. EDIT: Serial uses Serial.availableForWrite() for this, so I suggest WiFiClient use the same function name but will return only 0 or 1460 depending.

drmpf commented 8 years ago

Here is my rough outline of what I think is needed.
Not a working example and not up to you standard igrr

bool tcpSendBlocked = false;
static void ICACHE_FLASH_ATTR sentCb(void *arg) {
  tcpSendBlocked = false;
}

void setup() {
  // put your setup code here, to run once:
  espconn_regist_sentcb(conn, sentCb);
}

int clientAvailableForWrite() {
  if (tcpSendBlocked) {
    return 0;
  } else {
    return 1460;
  }
}

void clientSendBuffer(byte *buf) {
  tcpSendBlocked = true;
  client.write(buf);
}

void loop() {
  // put your main code here, to run repeatedly:

  if (clientAvailableForWrite()>0) {
    clientSendBuffer(buf); // will not block
  }
......
}
drmpf commented 8 years ago

I checked the Arduino WiFiClient.flush() and it also just consumes any buffered incoming data :-( So for consistency I suppose we need to leave flush() as it is. But I still find this method's action very un-expected. I think of flush() as something that happens on output not input.

me-no-dev commented 8 years ago

the WiFi library is not using espconn underneath, so the callbacks will not work.

drmpf commented 8 years ago

igrr mentioned LwIP library which has a call back called tcp_sent( ) that seems to do the job.

me-no-dev commented 8 years ago

sure, I just read the code you posted above and saw espconn_regist_sentcb(conn, sentCb);

drmpf commented 8 years ago

I stole that from elsewhere, but

        err_t   _sent(tcp_pcb* pcb, uint16_t len) {
            DEBUGV(":sent %d\r\n", len);
            _size_sent -= len;
            if(_size_sent == 0 && _send_waiting) esp_schedule();
            return ERR_OK;
        }

in ClientContext.h almost seems to do the job. Except i) _send_waiting is private ii) _sent() does not seem to set _send_waiting back to false? I assume I am missing something here.
But looks very close, if we added an accessor method for _send_waiting

igrr commented 8 years ago

You seem to be missing the point that no user code gets executed until _sent() callback is called.

Sketch execution is suspended in write method (link), and resumed in _sent method which you have quoted.

So adding an accessor for _send_waiting is not sufficient. write method has to be made non-blocking. This is a trivial change, but it would break compatibility with WiFi Shield library. I think it would be better to create a separate library with an event-driven API.

drmpf commented 8 years ago

OK, my mistake. That suggestion will not work.
You clearly stated that the write() call blocks for at least the time it takes to send the data and get the ACK.
An event-driven API is the way to go for compatibility.

drmpf commented 8 years ago

I have come up a few small mods for WiFiClient which allow the user to keep the sketch alive while sending WiFi data. As noted previously sending a packet to Windows blocks the entire sketch for 200mS

The mods keep the spirit of the original blocking api while adding a isSendWaiting() method to allow uses to check if WiFiClient.write() will block

Ignoring this new method the user's code is unchanged.

The operation is almost the same except that the first call to WiFiClient.write() return immediately. Subsequent calls will block for upto 5 sec if the client is waiting an ACK to the previous packet.

If the 5 sec timeout occures the current connection is considered broken and is aborted. The 5 sec timeout could be make larger to allow for retries if necessary.

With these changes the user's code will only block if the WiFiClient is busy sending a packet and cannot accept more data. This is similar to HardwareSerial which does not block until it cannot accept more data. WiFiClient::isSendWaiting() is similar to HardwareSerial::availableForWrite()

Now the user can optionally write code like

if (!client.isSendWaiting()) {
   client.write(buff,size); // will not block
} else {
   // don't call write, keep looping and accumulating bytes in buff ready for next write
}

This solves the problem in a Uart to WiFi bridge of lost incoming Serial data while the previous packet is waiting for ACK. In general sketches it ensures the loop() keeps running to do its other work.

Code changes in ClientContext.h

    size_t write(const char* data, size_t size) {
      if (!_pcb) {
        DEBUGV(":wr !_pcb\r\n");
        return 0;
      }

      if (size == 0) {
        return 0;
      }
      size_t counter = 0;
      while (isSendWaiting() && (counter < 5000) ) {
        delay(1);
        counter++;
      }
      if (counter >= 5000) {
        // timed out waiting for last send to complete
        _send_waiting = false;
        return (size_t)-1; // abort connection
      }
      size_t room = tcp_sndbuf(_pcb);
      size_t will_send = (room < size) ? room : size;
      err_t err = tcp_write(_pcb, data, will_send, 0);
      if (err != ERR_OK) {
        DEBUGV(":wr !ERR_OK\r\n");
        return 0;
      }

      _size_sent = will_send;
      DEBUGV(":wr\r\n");
      tcp_output( _pcb );
      _send_waiting = true;
      delay(0);
      //delay(5000); // max send timeout
      //_send_waiting = false;
      DEBUGV(":ww\r\n");
      return will_send - _size_sent;
    }

    bool isSendWaiting() {
      return _send_waiting;
    }
private:
    err_t _sent(tcp_pcb* pcb, uint16_t len) {
      DEBUGV(":sent %d\r\n", len);
      _size_sent -= len;
      if (_size_sent == 0 && _send_waiting) {
        _send_waiting = false;
        esp_schedule();
      }
      return ERR_OK;
    }

in WiFiClient.cpp

bool WiFiClient::isSendWaiting() {
  if (!_client) {
    return false;
  }
  return _client->isSendWaiting();
}

size_t WiFiClient::write(const uint8_t *buf, size_t size) {
  if (!_client || !size) {
    return 0;
  }
  //return _client->write(reinterpret_cast<const char*>(buf), size);
  size_t bytesWritten =  _client->write(reinterpret_cast<const char*>(buf), size);
  if (bytesWritten == ((size_t) - 1)) {
    // error abort connection
    if (!_client) {
      return 0;
    }
    _client->abort();
    _client->unref();
    _client = 0;
  }
  return bytesWritten;
}

in WiFiClient.h

  bool isSendWaiting();
drewfish commented 8 years ago

If WiFiClient::isSendWaiting() is similar to HardwareSerial::availableForWrite(), why not call it WiFiClient::availableForWrite() ?

drmpf commented 8 years ago

availableForWrite() is not available for WiFiClient. Whoever designed the Arduino stream libraries and sub-classes did not make them orthogonal.

"orthogonal" means that if an operation is available in one form of the instruction is it available in all forms. Originally applied to memory access for microprocessor instructions.

In this case its means if one stream subclass has availableForWrite() they all should.

(Don't get me started on flush() implementations for stream subclasses. No consistency at ALL. General advice don't ever call flush() on a Stream arg. You have no idea what will actually happen)

Edit: I am pushing this orthogonal idea a little, but availableForWrite() seems to be a sensible method for ALL stream subclasses to have.

drewfish commented 8 years ago

OK, fair enough.

grahamehorner commented 8 years ago

@drmpf @drewfish this is an issue with inconstant :-1: design and implementations :-1: and as a community we should think about best practice :+1: and maybe refactor libraries to follow best practice for consistency and ease of development :+1: but of course try to keep backwards compatibility :+1:

drmpf commented 8 years ago

Found another small mod needed, in ClientContext.write( ) need to return will_send; instead of return will_send - _size_sent;

drmpf commented 8 years ago

A Non-blocking ESP8266WiFi library is available at http://www.forward.com.au/pfod/pfodParserLibraries/index.html

This library consists of ESP8266WiFi and ESP2866WebServer libraries combined and with non-blocking code mods added as outlined above. Class names are the same as original libraries but file names and include names have been changed so that you can choose which library to use by changing your sketch includes. WebServer sends are still blocking only basic WiFi Client sends are non-blocking.

The library is based on the isSendWaiting() method. If you just change the includes of an existing sketch, without any code changes, then resulting operation is almost the same. The exception is that the first call to WiFiClient.write does not block but subsequent calls will block is the ESP is still busy with the previous packet.

Checking isSendWaiting() allows you to avoid making blocking calls to WiFiClient.write and buffer the outgoing data instead.

An pfodESP8266BufferedClient class has been added to allow you to automatically buffer outgoing data if the ESP is busy. This class also so reduces packet fragmentation and improves through put

Two example sketches included in the library illustrate the library's use.

While the library works will for me, I am not clear on the operation of esp_schedule(), and expert review of the code changes in ClientContext.h would be appreciated.

NickWaterton commented 8 years ago

Thank goodness for this!

I have been banging my head against this 200ms barrier for two days now. I'm reading 640x480 snapshots from a mini cam (which sends jpg images via ttl serial), at 115200 baud. Takes several seconds for one frame connected on serial. On WiFi, the receiving app times out all the time. Same problem sending jpg over MQTT. The jpg is about 50k in size.

I only really need the pictures for testing, as I'm using the camera as a motion sensor. Still need it to fine tune the motion sensor settings (ie get a snap of what triggered motion), and it will be outside.

I can get it to work with ridiculously large serial buffers, so I have been looking for a way round this bottleneck without reducing the baud rate to a crawl. This looks like it might just do it!

I'm going to test it out tomorrow. Thanks for this work.

devyte commented 6 years ago

@me-no-dev how is this different from your async code (besides obvious api differences)?

devyte commented 6 years ago

@drmpf could you please put your non-blocking libs in github?

vtomanov commented 6 years ago

Post is old BUT VERY USEFUL - got to the same problem - fixed it as described above !

adding whole new library is not practical in my case as I just need a simple async TCP =

but at least who has designed the original lib - needed to made properly designed and been changeable with friend classes and fully virtual methods ... which will eliminate the need of change the actual library code.. currently the only way to fix the stupid issue is to change the code of the library .. which is not good

drmpf commented 6 years ago

Hi @devyte , missed your comment.

Because the original isSendWaiting() solution added extra interface methods I had to modify and rename most of the the WiFi classes to do the mod. As much as I thought my code was a simple effective solution, @igrr did not take it up and later versions of the ESP8266 changed the Client code and I gave up trying to keep up. I still have the code for ESP8266 V2.2.0 You can pick it up from https://www.forward.com.au/pfod/pfodParserLibraries/pfodESP8266WiFi_2.2.0.zip (I have added the original webpage docs to the zip as a pdf) Edit: updated zip to contain the library with classes renamed so you can use it as a library with board install. see the pdf included and the examples)

Instead I opted for buffering the sends to limit the problem https://www.forward.com.au/pfod/pfodParserLibraries/pfodESP8266BufferedClient.zip is the library for the my ESP8266 buffered client.

The code buffers sends until either a) the WIFICLIENT_MAX_PACKET_SIZE is reached OR b) nothing is added to the buffer for 10mS

I find this much simpler to understand and use then the AsyncTCP

devyte commented 6 years ago

@drmpf thanks. Will those links survive, or are they time limited?

vtomanov commented 6 years ago

I read the original code in version 2,.3.0 and your code seems to do a fine job to make it usable :)

in my case I'm working on LoRA Gateway and waiting for 200millis (or worse 5 sec in worst case scenario) is completely unacceptable as I will have a lot of overwritten LoRa messages during this timeout..

just buffering will no solve my problem as I will still have big timeouts...

using your idea/code I was able to simulate a buffered ( my messages are 10-12 bytes) async TCP output/input and all is fine at the moment :)

Thanks !

drmpf commented 6 years ago

@drmpf thanks. Will those links survive, or are they time limited?

The links will work as long as my website survives, which will probably be longer then GitHub, but feel free to re-host the files elsewhere if it makes you feel more secure.

drmpf commented 6 years ago

Hi @vtomanov

I read the original code in version 2,.3.0 and your code seems to do a fine job to make it usable :) Glad to hear someone found it useful.

drmpf commented 6 years ago

@vtomanov , I am looking at LoRa remote control at the moment using Feather. Can you drop me an email, via www.forward.com.au?

vtomanov commented 6 years ago

hmmm the email click on the site do not do anything... my private email is vtomanov@gmail.com - drop me a line there

I'm working with 868MHz LoRa32u4 LoRa Board with IPEX Antenna for endpoint and LoRaGO DOCK for gateway and all works quite good as a hardware platform altogether, but I needed to build some specific software to make thing actually work. From software point of view the standard LoRA TTL protocol is a bit childish , all this web based/json thing is not for low level technology and adds a lot of problems and stability issues - hence - needed to design and implement my own simple protocol ( max message size 12bytes as bigger messages has issues ....

generally the setup is - endpoint with sensors, sending data to gateway using LoRa and gateway communicating with custom java hi-perf server software over TCP - the server synchronises the timer of all gateways and they synchronize the timers of all endpoints.. etc.