Open me21 opened 5 years ago
No, you're not. Request params will contain only the values passed after the question mark in the request URL, like https://esp32.local/?foo=bar which will lead to bar
being assigned to parameter foo
.
Both x-www-form-urlencoded
and multipart/form-data
will send the form data to the body of the POST request (see these MDN examples for the details on the differences in data structure).
At the moment, the server library does not support any special form body parsing, as I expected it mostly to be used for REST-like services, which can use other third-party libraries like ArduinoJSON. So by now, you would have to do it on your own by reading the request body. If you use urlencoded forms, you could start by reusing my implementation for parsing URL parameters. Besides the initial ?
character, it should be very similar (be aware of #19 in case you're sending other things than letters or numbers).
I'll tag this as a feature requests, but as you might've seen by my reaction on other issues, I'm a bit short on time at the moment.
I totally understand. Thanks for replying. I need to upload files to ESP32, so I guess I need multipart/form-data here. I'll try to implement request body handling myself, thanks!
For uploading files, multipart/form-data
is the only way to go anyway, otherwise you would only get the name of the uploaded file, not its contents.
If you need to stick to plain HTML forms, that's also the only option I know for uploading files. So you'll need to parse the Content-Type
request header for the boundary (or – more easily – read the first line of body, if you really trust your client) and then split the body parts.
The only other option that comes to my mind is to read the file in the browser with client side JavaScript and send a custom POST XHR to the server that contains only the file's content as body. Then you could just read the whole request body on the server. But that might not be feasible in every use case and it's only an easier solution if you've got some experience with JS and the characteristics of various browser's JS engines.
I've implemented multipart/form-data
parsing. I have no obvious bugs on a simple form with one file input and one text input.
I'm attaching my source code here for future reference as a starting point. Of course, it is far from perfect. For now, it keeps the entire field value in memory, which makes big files handling impossible. Instead, it should call the handler for every received chunk of the file. Oh well.
void handleField(const std::string& fieldName, const std::string& fieldValue) {
// called once per every field of the form
}
// #define DEBUG_MULTIPART_PARSER
void handleNewConfig(HTTPRequest* req, HTTPResponse* res) {
auto contentType = req->getHeader("Content-Type");
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Content type: ");
Serial.println(contentType.c_str());
#endif
auto boundaryIndex = contentType.find("boundary=");
if(boundaryIndex != std::string::npos) {
//TODO: remove all magic constants
auto boundary = contentType.substr(boundaryIndex + 9);
auto commaIndex = boundary.find(';');
boundary = "--" + boundary.substr(0, commaIndex);
if(boundary.size() > 72) {
return; //TODO: error 500, RFC violation
}
auto lastBoundary = boundary + "--";
// Stream the incoming request body to the response body
// Theoretically, this should work for every request size.
std::string buffer;
// What we're searching for
enum {BOUNDARY, CONTENT_DISPOSITION, DATA_START, DATA_END} searching = BOUNDARY;
std::string fieldName;
std::string fieldValue;
bool isFile;
// HTTPReqeust::requestComplete can be used to check whether the
// body has been parsed completely.
while(!(req->requestComplete())) {
const auto oldBufferSize = buffer.size();
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Old buffer size: ");
Serial.println(oldBufferSize);
#endif
buffer.resize(256);
// HTTPRequest::readBytes provides access to the request body.
// It requires a buffer, the max buffer length and it will return
// the amount of bytes that have been written to the buffer.
auto s = req->readChars(&buffer[oldBufferSize], buffer.size() - oldBufferSize);
if(!s)
break;
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Reading buffer (");
Serial.print(s);
Serial.print(" bytes): ");
Serial.println(buffer.c_str());
#endif
buffer.resize(s + oldBufferSize);
// reading line by line
std::istringstream is(buffer);
for(std::string line; is.good(); ) {
std::getline(is, line);
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Next line is ");
Serial.print(line.size());
Serial.print(" bytes: ");
Serial.println(line.c_str());
#endif
// remove preceding \r
bool crRemoved = false;
if(!line.empty() && (line.back() == '\r')) {
crRemoved = true;
line.pop_back();
}
if(is.good()) {
switch(searching) {
case BOUNDARY:
if((line == boundary) || (line == lastBoundary)) {
#ifdef DEBUG_MULTIPART_PARSER
Serial.println("Boundary found");
#endif
searching = CONTENT_DISPOSITION;
// save the previous field value
if(!fieldValue.empty()) {
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Field: ");
Serial.println(fieldName.c_str());
Serial.print("Value: ");
Serial.println(fieldValue.c_str());
#endif
handleField(fieldName, fieldValue);
fieldValue.clear();
}
}
break;
case CONTENT_DISPOSITION:
if(line.substr(0, 21) == "Content-Disposition: ") {
auto i = line.find(" name=");
if(i != std::string::npos) {
#ifdef DEBUG_MULTIPART_PARSER
Serial.println("Correct content disposition found");
#endif
isFile = line.find(" filename=") != std::string::npos;
fieldName = line.substr(i + 7);
fieldName.erase(std::find(fieldName.begin(), fieldName.end(), ';'), fieldName.end());
fieldName.pop_back();
searching = DATA_START;
} else
searching = BOUNDARY;
}
break;
case DATA_START:
if(line.empty())
searching = DATA_END;
break;
case DATA_END:
if((line == boundary) || (line == lastBoundary)) {
searching = CONTENT_DISPOSITION;
// remove the last CRLF pair before the boundary
if(isFile && (fieldValue.size() > 1)) {
fieldValue.pop_back();
fieldValue.pop_back();
}
// save the previous field value
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Field: ");
Serial.println(fieldName.c_str());
Serial.print("Value (");
Serial.print(fieldValue.size());
Serial.print(" bytes): ");
Serial.println(fieldValue.c_str());
#endif
handleField(fieldName, fieldValue);
fieldValue.clear();
break;
}
// else not boundary
if(isFile) {
// if binary file contents, add CRLF ignored by getline
if(crRemoved)
line.push_back('\r');
line.push_back('\n');
} else {
// else text only, removing preceding \r (ordinary values are single-line)
//if(!line.empty())
// line.pop_back();
searching = BOUNDARY;
}
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Adding (");
Serial.print(line.size());
Serial.print(" bytes): ");
Serial.print(line.c_str());
#endif
fieldValue += line;
break;
}
} else {
// no LF found, store the remaining chars in the buffer and read further
#ifdef DEBUG_MULTIPART_PARSER
Serial.println("No more LF found in buffer");
Serial.print("Remaining (");
Serial.print(line.size() + crRemoved);
Serial.print(" bytes): ");
Serial.print(line.c_str());
#endif
if(crRemoved)
Serial.print('\r');
// if last char is \r, it's possible the next one will be \n, but we haven't read it to buffer yet
// if the buffer contains the boundary, it must be in its beginning, because boundary starts right after CRLF pair.
// but then, the remainder cannot be longer than 72 bytes ("--" + boundary max 70 chars)
if(line.size() && (line.size() < 73) && (boundary.find(line) == 0)) {
// boundary found, at least partially
// do nothing, the remainder will be read on the next loop iteration
#ifdef DEBUG_MULTIPART_PARSER
Serial.print("Boundary found without LF: ");
Serial.println(line.c_str());
#endif
} else {
// no boundary
// to avoid the case when the buffer is full (256 bytes), we add it to field value here
//TODO: it is possible to cause hangups by passing long strings during (searching != DATA_END),
// but for now we assume the client is well-behaving
if(searching == DATA_END) {
if(isFile && crRemoved)
line.push_back('\r');
#ifdef DEBUG_MULTIPART_PARSER
Serial.println("Adding it.");
#endif
fieldValue += line;
line.clear();
}
}
buffer = line;
}
}
}
}
}
Thank you for providing the code, that will be a really good starting point!
I also thought about how one could deal with large uploads and I came up with an API that allows to iterate over the body's fields and then read each field step by step using a buffer, so that you could e.g. directly write it to an SD Card. That should allow for arbitrary body sizes, but comes with the downside of not being able to access random fields of the body. To address both ways of body encoding (as urlencoded has way less overhead for short text values) there could be a parser for each content type based on the same body parser API, making the content encoding easily interchangeable. Usage should then be something like this:
void handleRequest(HTTPRequest * req, HTTPResponse * res) {
HTTPMultipartBodyParser * bParser = new HTTPMultipartBodyParser(req);
while(bParser->nextField()) {
std::string fieldName = bParser->getFieldName();
byte buf[128];
size_t len;
while(bParser->getRemainingLength()>0) {
len = bParser->read(buf,128);
// Do something with buf[0..len] and fieldName
}
}
}
It's just a draft of the API and most likely won't even compile at the time, but I committed it for reference to the bodyparser branch (7861a98f1cebdebe623ceee85e6545c3f0757f50).
I'm making good progress implementing this, based on the bodyparser branch, but I'm running into an issue with the suggested API. getRemainingLength()
and actually getLength()
as well aren't really implementable without first reading the whole content of the multipart part. The part header doesn't contain a field that defines the size of that part, so that means you basically have to read all of it before you can know the size. (Not a great design of the MIME multipart: it also means you have to search for CRLF followed by the separator string in binary data, brrrr). I think I'll replace the two size calls with a bool endOfField()
, unless you have a better suggestion.
That API was really only a suggestion on how one could implement this, and I see the problem. I'd clearly put the ability of reading parts of arbitrary length over the ability to get the length in advance, so your endOfField()
is the better choice here.
it also means you have to search for CRLF followed by the separator string in binary data, brrrr
I think binary data may be the easy case. If the client decides to make use of a content-transfer-encoding
(see section 4.3 in RFC2388 in combination with section 6 in RFC2045), it might be necessary to do an additional conversion step. I don't know how current clients do behave, so I'd expect binary
for files and 8bit
or maybe even quoted-printable
for text fields being the default. If the clients do switch between different variants, I'd suggest to implement the most common one first and then add converter functions with the same signature later on, maybe something like:
/**
* Removes quoted printable content transfer encoding
* data_in buffer to read from
* size_in size of data_in buffer
* data_out buffer to write the decoded data to
* size_out number of bytes written to data_out
* returns: Number of bytes read from data_in
*/
size_t transferDecodeQuotedPrintable(uint8_t *data_in, size_t size_in, uint8_t *data_out, size_t &size_out);
That should allow handling all encodings while the caller can remain agnostic of the symbol length in the encoded data. He only has to check for <CR><LF><boudary>
in the incoming data. If size_in > return_value
, the caller knows that part of the data couldn't be processed, most likely because only a partial input symbol is present in data_in
.
I've created a pull request. I'm not 100% happy with the code, especially because I've had to implement yet another level of buffering (I saw the the existing buffered reading code but it's in private methods, and I've decided to implement my own because of the tricky way to test for boundaries while reading binary data). Performance seems sort-of decent despite the extra copying.
I tried to send POST requests with forms. Tried both x-www-form-urlencoded and multipart/form-data, but request params array is empty in all cases. Does the server support handling HTML forms? Am I doing something wrong?