fhessel / esp32_https_server

Alternative ESP32 Webserver implementation for the ESP32 Arduino Core, supporting HTTPS and HTTP.
MIT License
338 stars 124 forks source link

Support forms in POST #29

Open me21 opened 5 years ago

me21 commented 5 years ago

I tried to send POST requests with forms. Tried both x-www-form-urlencoded and multipart/form-data, but request params array is empty in all cases. Does the server support handling HTML forms? Am I doing something wrong?

fhessel commented 5 years ago

No, you're not. Request params will contain only the values passed after the question mark in the request URL, like https://esp32.local/?foo=bar which will lead to bar being assigned to parameter foo.

Both x-www-form-urlencoded and multipart/form-data will send the form data to the body of the POST request (see these MDN examples for the details on the differences in data structure).

At the moment, the server library does not support any special form body parsing, as I expected it mostly to be used for REST-like services, which can use other third-party libraries like ArduinoJSON. So by now, you would have to do it on your own by reading the request body. If you use urlencoded forms, you could start by reusing my implementation for parsing URL parameters. Besides the initial ? character, it should be very similar (be aware of #19 in case you're sending other things than letters or numbers).

I'll tag this as a feature requests, but as you might've seen by my reaction on other issues, I'm a bit short on time at the moment.

me21 commented 5 years ago

I totally understand. Thanks for replying. I need to upload files to ESP32, so I guess I need multipart/form-data here. I'll try to implement request body handling myself, thanks!

fhessel commented 5 years ago

For uploading files, multipart/form-data is the only way to go anyway, otherwise you would only get the name of the uploaded file, not its contents.

If you need to stick to plain HTML forms, that's also the only option I know for uploading files. So you'll need to parse the Content-Type request header for the boundary (or – more easily – read the first line of body, if you really trust your client) and then split the body parts.

The only other option that comes to my mind is to read the file in the browser with client side JavaScript and send a custom POST XHR to the server that contains only the file's content as body. Then you could just read the whole request body on the server. But that might not be feasible in every use case and it's only an easier solution if you've got some experience with JS and the characteristics of various browser's JS engines.

me21 commented 5 years ago

I've implemented multipart/form-data parsing. I have no obvious bugs on a simple form with one file input and one text input.

I'm attaching my source code here for future reference as a starting point. Of course, it is far from perfect. For now, it keeps the entire field value in memory, which makes big files handling impossible. Instead, it should call the handler for every received chunk of the file. Oh well.

void handleField(const std::string& fieldName, const std::string& fieldValue) {
  // called once per every field of the form
}

// #define DEBUG_MULTIPART_PARSER
void handleNewConfig(HTTPRequest* req, HTTPResponse* res) {
  auto contentType = req->getHeader("Content-Type");
#ifdef DEBUG_MULTIPART_PARSER      
  Serial.print("Content type: ");
  Serial.println(contentType.c_str());
#endif
  auto boundaryIndex = contentType.find("boundary=");
  if(boundaryIndex != std::string::npos) {
    //TODO: remove all magic constants
    auto boundary = contentType.substr(boundaryIndex + 9);
    auto commaIndex = boundary.find(';');
    boundary = "--" + boundary.substr(0, commaIndex);
    if(boundary.size() > 72) {
      return; //TODO: error 500, RFC violation
    }
    auto lastBoundary = boundary + "--";
    // Stream the incoming request body to the response body
    // Theoretically, this should work for every request size.
    std::string buffer;
    // What we're searching for
    enum {BOUNDARY, CONTENT_DISPOSITION, DATA_START, DATA_END} searching = BOUNDARY;
    std::string fieldName;
    std::string fieldValue;
    bool isFile;
    // HTTPReqeust::requestComplete can be used to check whether the
    // body has been parsed completely.
    while(!(req->requestComplete())) {
      const auto oldBufferSize = buffer.size();
#ifdef DEBUG_MULTIPART_PARSER      
      Serial.print("Old buffer size: ");
      Serial.println(oldBufferSize);
#endif
      buffer.resize(256);
      // HTTPRequest::readBytes provides access to the request body.
      // It requires a buffer, the max buffer length and it will return
      // the amount of bytes that have been written to the buffer.
      auto s = req->readChars(&buffer[oldBufferSize], buffer.size() - oldBufferSize);
      if(!s)
        break;
#ifdef DEBUG_MULTIPART_PARSER
      Serial.print("Reading buffer (");
      Serial.print(s);
      Serial.print(" bytes): ");
      Serial.println(buffer.c_str());
#endif
      buffer.resize(s + oldBufferSize);      
      // reading line by line
      std::istringstream is(buffer);
      for(std::string line; is.good(); ) {
        std::getline(is, line);
#ifdef DEBUG_MULTIPART_PARSER      
        Serial.print("Next line is ");
        Serial.print(line.size());
        Serial.print(" bytes: ");
        Serial.println(line.c_str());
#endif
        // remove preceding \r
        bool crRemoved = false;
        if(!line.empty() && (line.back() == '\r')) {
          crRemoved = true;
          line.pop_back();
        }
        if(is.good()) {
          switch(searching) {
            case BOUNDARY:
              if((line == boundary) || (line == lastBoundary)) {
#ifdef DEBUG_MULTIPART_PARSER      
                Serial.println("Boundary found");
#endif
                searching = CONTENT_DISPOSITION;
                // save the previous field value
                if(!fieldValue.empty()) {
#ifdef DEBUG_MULTIPART_PARSER      
                  Serial.print("Field: ");
                  Serial.println(fieldName.c_str());
                  Serial.print("Value: ");
                  Serial.println(fieldValue.c_str());
#endif
                  handleField(fieldName, fieldValue);
                  fieldValue.clear();
                }
              }
              break;
            case CONTENT_DISPOSITION:
              if(line.substr(0, 21) == "Content-Disposition: ") {
                auto i = line.find(" name=");
                if(i != std::string::npos) {
#ifdef DEBUG_MULTIPART_PARSER      
                  Serial.println("Correct content disposition found");
#endif
                  isFile = line.find(" filename=") != std::string::npos;
                  fieldName = line.substr(i + 7);
                  fieldName.erase(std::find(fieldName.begin(), fieldName.end(), ';'), fieldName.end());
                  fieldName.pop_back();
                  searching = DATA_START;
                } else
                  searching = BOUNDARY;
              }
              break;
            case DATA_START:
              if(line.empty())
                searching = DATA_END;
              break;
            case DATA_END:
              if((line == boundary) || (line == lastBoundary)) {
                searching = CONTENT_DISPOSITION;
                // remove the last CRLF pair before the boundary
                if(isFile && (fieldValue.size() > 1)) {
                  fieldValue.pop_back();
                  fieldValue.pop_back();
                }
                // save the previous field value
#ifdef DEBUG_MULTIPART_PARSER      
                Serial.print("Field: ");
                Serial.println(fieldName.c_str());
                Serial.print("Value (");
                Serial.print(fieldValue.size());
                Serial.print(" bytes): ");
                Serial.println(fieldValue.c_str());
#endif
                handleField(fieldName, fieldValue);
                fieldValue.clear();
                break;
              }
              // else not boundary
              if(isFile) {
                // if binary file contents, add CRLF ignored by getline
                if(crRemoved)
                  line.push_back('\r');
                line.push_back('\n');
              } else {
                // else text only, removing preceding \r (ordinary values are single-line)
                //if(!line.empty())
                //  line.pop_back();
                searching = BOUNDARY;
              }
#ifdef DEBUG_MULTIPART_PARSER      
              Serial.print("Adding (");
              Serial.print(line.size());
              Serial.print(" bytes): ");
              Serial.print(line.c_str());
#endif
              fieldValue += line;
              break;
          }
        } else {
          // no LF found, store the remaining chars in the buffer and read further
#ifdef DEBUG_MULTIPART_PARSER      
          Serial.println("No more LF found in buffer");
          Serial.print("Remaining (");
          Serial.print(line.size() + crRemoved);
          Serial.print(" bytes): ");
          Serial.print(line.c_str());
#endif
          if(crRemoved)
            Serial.print('\r');
          // if last char is \r, it's possible the next one will be \n, but we haven't read it to buffer yet
          // if the buffer contains the boundary, it must be in its beginning, because boundary starts right after CRLF pair.
          // but then, the remainder cannot be longer than 72 bytes ("--" + boundary max 70 chars)
          if(line.size() && (line.size() < 73) && (boundary.find(line) == 0)) {
            // boundary found, at least partially
            // do nothing, the remainder will be read on the next loop iteration
#ifdef DEBUG_MULTIPART_PARSER      
            Serial.print("Boundary found without LF: ");
            Serial.println(line.c_str());
#endif
          } else {
            // no boundary
            // to avoid the case when the buffer is full (256 bytes), we add it to field value here
            //TODO: it is possible to cause hangups by passing long strings during (searching != DATA_END),
            // but for now we assume the client is well-behaving
            if(searching == DATA_END) {
              if(isFile && crRemoved)
                line.push_back('\r');
#ifdef DEBUG_MULTIPART_PARSER      
              Serial.println("Adding it.");
#endif
              fieldValue += line;
              line.clear();
            }
          }
          buffer = line;
        }
      }
    }
  }
}
fhessel commented 5 years ago

Thank you for providing the code, that will be a really good starting point!

I also thought about how one could deal with large uploads and I came up with an API that allows to iterate over the body's fields and then read each field step by step using a buffer, so that you could e.g. directly write it to an SD Card. That should allow for arbitrary body sizes, but comes with the downside of not being able to access random fields of the body. To address both ways of body encoding (as urlencoded has way less overhead for short text values) there could be a parser for each content type based on the same body parser API, making the content encoding easily interchangeable. Usage should then be something like this:

void handleRequest(HTTPRequest * req, HTTPResponse * res) {
  HTTPMultipartBodyParser * bParser = new HTTPMultipartBodyParser(req);
  while(bParser->nextField()) {
    std::string fieldName = bParser->getFieldName();
    byte buf[128];
    size_t len;
    while(bParser->getRemainingLength()>0) {
      len = bParser->read(buf,128);
      // Do something with buf[0..len] and fieldName
    }
  }
}

It's just a draft of the API and most likely won't even compile at the time, but I committed it for reference to the bodyparser branch (7861a98f1cebdebe623ceee85e6545c3f0757f50).

jackjansen commented 4 years ago

I'm making good progress implementing this, based on the bodyparser branch, but I'm running into an issue with the suggested API. getRemainingLength() and actually getLength() as well aren't really implementable without first reading the whole content of the multipart part. The part header doesn't contain a field that defines the size of that part, so that means you basically have to read all of it before you can know the size. (Not a great design of the MIME multipart: it also means you have to search for CRLF followed by the separator string in binary data, brrrr). I think I'll replace the two size calls with a bool endOfField(), unless you have a better suggestion.

fhessel commented 4 years ago

That API was really only a suggestion on how one could implement this, and I see the problem. I'd clearly put the ability of reading parts of arbitrary length over the ability to get the length in advance, so your endOfField() is the better choice here.

it also means you have to search for CRLF followed by the separator string in binary data, brrrr

I think binary data may be the easy case. If the client decides to make use of a content-transfer-encoding (see section 4.3 in RFC2388 in combination with section 6 in RFC2045), it might be necessary to do an additional conversion step. I don't know how current clients do behave, so I'd expect binary for files and 8bit or maybe even quoted-printable for text fields being the default. If the clients do switch between different variants, I'd suggest to implement the most common one first and then add converter functions with the same signature later on, maybe something like:

/**
 * Removes quoted printable content transfer encoding
 * data_in buffer to read from
 * size_in size of data_in buffer
 * data_out buffer to write the decoded data to
 * size_out number of bytes written to data_out
 * returns: Number of bytes read from data_in
 */
size_t transferDecodeQuotedPrintable(uint8_t *data_in, size_t size_in, uint8_t *data_out, size_t &size_out);

That should allow handling all encodings while the caller can remain agnostic of the symbol length in the encoded data. He only has to check for <CR><LF><boudary> in the incoming data. If size_in > return_value, the caller knows that part of the data couldn't be processed, most likely because only a partial input symbol is present in data_in.

jackjansen commented 4 years ago

I've created a pull request. I'm not 100% happy with the code, especially because I've had to implement yet another level of buffering (I saw the the existing buffered reading code but it's in private methods, and I've decided to implement my own because of the tricky way to test for boundaries while reading binary data). Performance seems sort-of decent despite the extra copying.