PerMalmberg / Smooth

C++ framework for embedded programming on top of Espressif's ESP-IDF.
Apache License 2.0
325 stars 30 forks source link

modified MIMEParser class for upload of larger files and multiple files #132

Closed squonk11 closed 4 years ago

squonk11 commented 4 years ago

modified MIMEParser class for uploading lager files and multiple files

squonk11 commented 4 years ago

I will take care about the remaining issues during the next days. Unfortunately I have a problem running the shell scripts for the tests in my Docker; I think this is related to fact that the shell scripts are having CRLF line endings. I will try to find a fix for this also.

squonk11 commented 4 years ago

Now I changed the line endings of the shell scripts to LF and the error message of the missing interpreter "bin/bash^M" is gone and the script starts to be executed in Docker. But now I get the error message ./CI/build_test.sh: line 14: docker: command not found. Do you have an idea what I am missing?

PerMalmberg commented 4 years ago

The CI-scripts are meant to be run outside of docker; they will start an instance of the docker-image to run the tests.

squonk11 commented 4 years ago

but starting the scripts outside of Docker I can only do under Linux, right? Or is there a trick to do it under Windows also - maybe WSL?

PerMalmberg commented 4 years ago

Yes, I'd expect WSL to work.

I'd guess any bash terminal would work, such as Git-bash that comes with Git for Windows.

squonk11 commented 4 years ago

unfortunately both do not work. With git bash I get the error message: C:\Program Files\Docker\Docker\resources\bin\docker.exe: Error response from daemon: the working directory 'C:/Program Files/Git/src' is invalid, it needs to be an absolute path. I guess this is somehow related to the my Git installation because the directory C:/Program Files/Git/src really does not exist. With WSL I get the error message docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?. This probably is due to the fact that WSL assumes that Docker is installed under WSL (but it is not). So, I think I have to continue the try and error method via several commit/push sequences.

PerMalmberg commented 4 years ago

Aww, sorry to hear that.

You can ofc simply run the commands that the CI-scripts themselves run from within Docker.

PerMalmberg commented 4 years ago

Last! :) @squonk11 :)

PerMalmberg commented 4 years ago

@squonk11 Are you happy with the PR in its current state? I've not test run it myself, but I presume you have.

squonk11 commented 4 years ago

I did quite a lot of testing with the previous version using lambda functions. This version I did not test so much. I do not expect issues with this version but before accepting the pull request maybe it would be better to do some more testing. I will try to do it over the next weekend and finally add this PR to the file CONTRIBUTORS.md. Do you have any more comments on the code?

PerMalmberg commented 4 years ago

I didn't see any obvious issues with the code.

I was thinking that we could perhaps do some automatic testing using the Linux version and curl inside docker, i.e. do actual calls to the HTTP-server. That way we get reproducability of the tests and can test a large range of cases.

squonk11 commented 4 years ago

Automatic testing would be a good idea. Unfortunately I am having problems starting bash scripts from outside Docker. Is it possible to do use these scripts from inside Docker as well? How to start the HTTP-server inside Docker? Additionally I did not use curl before - but I think I could find out how to use it.

PerMalmberg commented 4 years ago

The scripts that currently start the CI-test and builds just start a docker instance to run other scripts. The "other" scripts can all be run manually from within docker.

Python might actually be a better idea than curl, as it is easier to build tests in Python than in bash.

Starting the HTTP-server inside docker would require a test application built for testing the HTTP server specifically. The tests we currently have can serve the example folder with all the images, but that's not really useful in automated testing.

I'm thinking that an echo server, i.e. reply back with what is received, could be a useful thing. Then we can automated sending and receiving data from a single byte up to several mega-bytes with little effort.

squonk11 commented 4 years ago

So, the Python script should launch a HTTP client which sends the files to the HTTP test-server. The HTTP test-server will then echo the files back to the python script. In order to receive the echoed files the python script needs to also launch a HTTP server which will then receive the files and comapres the contents against the files sent. Is that what you have in mind? EDIT: to be more precise: it will not "launch" the HTTP client and server but "act" as a HTTP client and server.

PerMalmberg commented 4 years ago

No, no need for a HTTP server on the Python side, that can be done over the same client socket that Python uses to send the data. So, POST some data, then receive the answer.

squonk11 commented 4 years ago

Will the upload from the Python script be done via requests; e.g. like so:

import requests

url = "http://localhost:5000/"
fin = open('simple_table.pdf', 'rb')
files = {'file': fin}
try:
  r = requests.post(url, files=files)
    print r.text
finally:
    fin.close()

? How must the Http-Server send the file back to the Python script? I guess it must send it also via a Multipart/Mime protocoll. But how to do that - there is not such a functionality in Smooth, isn't it?

PerMalmberg commented 4 years ago

Yes, requests will most likely do the job.

No there isn't. How about this:

  1. Pre-generate a set of random binary data so that each test can be recreated and re-run.
  2. Create a hash of the data
  3. Send the data to Smooth.
  4. Calculate the hash on the received data.
  5. Reply with the hash
  6. Compare the hash. If they don't match, the test fails.

This would allow us to test everything from 1 byte to megabytes.

squonk11 commented 4 years ago

The hash will be one hash value calculated over all files to be send (in case of multiple files)? The hash value will then be returned via the HTTP response?

PerMalmberg commented 4 years ago

One per file, and yes it will be the response.

On Fri, 29 May 2020, 10:47 Lothar, notifications@github.com wrote:

The hash will be one hash value calculated over all files to be send (in case of multiple files)? The hash value will then be returned via the HTTP response?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-635854088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLA5DIMUTYNISMWAEWDRT5ZDVANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

But there is only one response also in the case of multi file upload. So, the reply must contain the hashes for all files; e.g. like this: "File1.txt:hash1 File2.bin:hash2 File3.xyz:hash3 ... " Pyhthon then needs to analyze this response. Is that correct?

PerMalmberg commented 4 years ago

Oh, right. To keep it as simple as possible I think it's best to return a JSON object with file/hash pairs for each file. We can then use the same code regardless if one or hundred files were received/sent.

On Sun, 31 May 2020, 08:54 Lothar, notifications@github.com wrote:

But there is only one response also in the case of multi file upload. So, the reply must contain the hashes for all files; e.g. like this: "File1.txt:hash1 File2.bin:hash2 File3.xyz:hash3 ... " Pyhthon then needs to analyze this response. Is that correct?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-636431424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLHALB42Y7O5DTI2HD3RUH5KXANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

So, the JSON object will look e.g. like this:

([
  {
    "filename": File1.txt,
    "hash": hash1
  },
  {
    "filename": File2.bin,
    "hash": hash2
  },
  ...
)]

? I guess this can be generated using nlohmann library.

PerMalmberg commented 4 years ago

Yes, although there is no need for separate JSON object per file i think. Just {"x" : "y", "z":"w", ... }

Nlohmann should solve this no problem.

On Sun, 31 May 2020, 10:29 Lothar, notifications@github.com wrote:

So, the JSON object will look e.g. like this:

([ { "filename": File1.txt, "hash": hash1 }, { "filename": File2.bin, "hash": hash2 }, ... )]

? I guess this can be generated using nlohmann library.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-636439702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLEL4H3IRTY5TFLFV2DRUIIM3ANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

o.k. I can try to setup such a test. But probably I will have questions because until now I have only very low or no experience with Python, JSON, Docker, hash calculation, ... . Additionally I also have not much time to work on this - so, it will take some time. If you can wait for that, then it is o.k. for me.

PerMalmberg commented 4 years ago

We're both busy people 🙂

There is no rush getting this done. It's better it does get done and we know the HTTP implementation is good. This is something I've wanted to do for a while now for the original implementation but had to put off for other things.

I'll see what I can do on my end in the near-ish future.

On Sun, 31 May 2020, 11:02 Lothar, notifications@github.com wrote:

o.k. I can try to setup such a test. But probably I will have questions because until now I have only very low or no experience with Python, JSON, Docker, hash calculation, ... . Additionally I also have not much time to work on this - so, it will take some time. If you can wait for that, then it is o.k. for me.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-636442805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLG65JQFGVMUBILTZBTRUIMIXANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

A few more questions before I start to implement the new test function:

  1. You mentioned above that this should be a test function using the "linux version". If I use the http_server_test as a basis, shouldn't it work directly for both, linux and ESP32? I have no linux server available, so I can only work on the ESP32.
  2. Can I simply implement the non-TLS server just for simplicity?
  3. I see there are many algorithms available for calculating hashes. MD5 seems to be used quite frequently. There are libraries for C++ and also for Python (e.g. in hashlib). Is this a good choice in your view, or do you have a better suggestion?
  4. Could it make sense to accept the PCR for the modifications on MIMEParser made so far in order to separate them and also our following discussions from another?
  5. I will call the new test function http_files_upload_test. Is that o.k.?
PerMalmberg commented 4 years ago
1. I have no linux server available, so I can only work on the ESP32

That's fine. As long as no hardware, for which there is no mock-up, a Smooth-app can be compiled for both platforms. It's just so much easier and quicker to develop for Linux, and you have the additional benefit of being able to use tools like AdressSanitizer and Valgrind.

Can I simply implement the non-TLS server just for simplicity?

Thankfully using TLS or not is just a matter of selecting the socket-type for the template argument so the HTTP server is not aware of if it is running over TLS or not. So, yes, but there is really no difference.

I see there are many algorithms available for calculating hashes...

ESP-IDF comes with libsodium which has support for blake2b hashes and so does Python. As long as we use the same key on both sides they will produce the same hash. I'd recommend using this, unless you have a better suggestion.

Could it make sense to accept the PCR for the modifications on MIMEParser made so far in order to separate them and also our following discussions from another?

It does, though I want to at least test-run it myself a bit first.

I will call the new test function http_files_upload_test. Is that o.k.?

I have no objections to that.

squonk11 commented 4 years ago

While testing the file upload using a Python script I detected a residual problem that the reply is not being sent because is_last() never returns true in UploadResponder.cpp because in HTTPServer.h last_part is not being passed to the handler for the the last part. So, don't accept the current status of my PR. I will try to fix this issue first.

PerMalmberg commented 4 years ago

Glad to hear the testing is having a positive effect.

On Sat, 6 Jun 2020, 20:44 Lothar, notifications@github.com wrote:

While testing the file upload using a Python script I detected a residual problem that the reply is not being sent because is_last() never returns true in UploadResponder.cpp because in HTTPServer.h last_part is not being passed to the handler for the the last part. So, don't accept the current status of my PR. I will try to fix this issue first.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-640102126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLDILZKKVCZYXQFN5QLRVKFBXANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

I found this problem already recently and I thought I fixed it. But somehow the fix did not find its way into the PR. In order to overcome this problem, I simply used an override for end_of_request() in order to send the reply. I think, since now there is the end_of_request() override available in HTTPRequestHandler the previous solution using is_last() in order to check if the reply can be sent is obsolete now. Maybe this function can be removed from HTTPRequestHandler - also because it does not work since request_params.last_part is not updated in HTTPServer::handle. BTW the http_server_test now in the repo probably also does not work (I did not check).

PerMalmberg commented 4 years ago

In order to overcome this problem, I simply used an override for end_of_request() in order to send the reply.

That's a way to do it, but to fix the actual problem I think this line should be copied to before this line so that is_last is updated.

squonk11 commented 4 years ago

That's a way to do it, but to fix the actual problem I think this line should be copied to before this line so that is_last is updated.

This I tested already and it does not work. The reason is that the update of request_params.last_part must be done before the call of handler->request(timeout_modifier, requested_url, data); because is_last() is called inside handler->request(). But doing the update_call_params() allways before handler->request() could be quite costly(?) because it will be called for each chunk of data received.

PerMalmberg commented 4 years ago

Oh, right. So then we do this instead.

// Call order is important - must update call params before calling the rest of the methods in the
// inheriting class.
if (first_part || last_part)
{
    handler->update_call_params(...)
}

if (first_part)
{
    handler->prepare_mime();
    handler->start_of_request();
}

handler->request(...);

if (last_part)
{
    handler->end_of_request();
}
squonk11 commented 4 years ago

That looks good. I will check it tomorrow.

squonk11 commented 4 years ago

Now I am trying to calculate the hash values but I am stuck at the very beginning: as soon as I include sodium.h and define crypto_generichash_state state; I get the linker error memalign_function_was_linked_but_unsupported_in_esp_idf. Obviously sodium requires some feature which esp-idf is not supporting? Do you have an idea?

EDIT: I found already a solution for this: using sodium_malloc helps.

squonk11 commented 4 years ago

I encountered another problem: the hash calculated in Python using hashlib is different from value that libsodium on the ESP32 side calculates. Is there any special thing to be taken into account? Currently I am generating a hash of length 32 (which seems to be the only size which libsodium supports) and I am using the same keystring "my16characterKey". Do you have any hints?

PerMalmberg commented 4 years ago

I encountered another problem: the hash calculated in Python using hashlib is different from value that libsodium on the ESP32 side calculates. Is there any special thing to be taken into account? Currently I am generating a hash of length 32 (which seems to be the only size which libsodium supports) and I am using the same keystring "my16characterKey". Do you have any hints?

No... I would expect the hashes to be the same for the same input, otherwise what use is it? Are you using the same IV on both sides?

Wikipedia has some example hashes for empty string data, do you get the same hash for empty an string?

squonk11 commented 4 years ago

In Wikipedia I do not find a blake2b hash (with digest size of 32 bytes) for an empty string. But if I calculate a blake2b hash (with digest size of 32 bytes) in Python and with libsodium I get the same results. But I don't get the same hashes for the transmitted files. So probably there is some other issue with my code. I need to check in more detail during the next days.

PerMalmberg commented 4 years ago

Ok, that's a step in the right direction :)

squonk11 commented 4 years ago

After modifying my Python code a bit and using an empty string as Key, the hashes on both sides seem to be the same. Unfortunately it is not quite clear to my why the previous version did not work ... Currently I am comparing the hashes on both sides manually. The next step is to send them as JSON object back to the Python script. Do you have a hint how I can send a JSON object as reply? Do I have to send it as responses::StringResponse?

squonk11 commented 4 years ago

Now I have the first version of the test ready. I used the StringResponse to reply with the JSON string. In Python this string gets converted to a Python dict. The hashes of the files are then compared to the hashes returned from the HTTP server and a success or error message is printed. I tested several times with the 1000 icon files and no error ocured. The only message I see is again that the watchdog on one CPU is triggered due to not being serviced. But the transmission still completes. Shall I upload this as first version for discussion into this PR since it is still open?

PerMalmberg commented 4 years ago

Yes please do that. Did you try with larger files too?

On Fri, 12 Jun 2020, 14:21 Lothar, notifications@github.com wrote:

Now I have the first version of the test ready. I used the StringResponse to reply with the JSON string. In Python this string gets converted to a Python dict. The hashes of the files are then compared to the hashes returned from the HTTP server and a success or error message is printed. I tested several times with the 1000 icon files and no error ocured. The only message I see is again that the watchdog on one CPU is triggered due to not being serviced. But the transmission still completes. Shall I upload this as first version for discussion into this PR since it is still open?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-643241722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLF4FPBN7JKSJZYWKMTRWIMUPANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

yes, I tried also with lager files; e.g. three files: 8MB, 10MB and 1kB I will provide the update soon.

PerMalmberg commented 4 years ago

I tried to run this in as a Linux version, but that unfortunately results in a crash.

[1] 31201 abort (core dumped) ./http_files_upload_test.elf

I'll have to debug this and see what goes wrong.

squonk11 commented 4 years ago

Dies it crash during or after file(s) Transmission?

Am 12. Juni 2020 22:00:49 MESZ schrieb Per Malmberg notifications@github.com:

I tried to run this in as a Linux version, but that unfortunately results in a crash.

[1] 31201 abort (core dumped) ./http_files_upload_test.elf

I'll have to debug this and see what goes wrong.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-643459499

-- Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

PerMalmberg commented 4 years ago

As soon as transmission starts. Fist file.

On Fri, 12 Jun 2020, 22:15 Lothar, notifications@github.com wrote:

Dies it crash during or after file(s) Transmission?

Am 12. Juni 2020 22:00:49 MESZ schrieb Per Malmberg < notifications@github.com>:

I tried to run this in as a Linux version, but that unfortunately results in a crash.

[1] 31201 abort (core dumped) ./http_files_upload_test.elf

I'll have to debug this and see what goes wrong.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-643459499

-- Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PerMalmberg/Smooth/pull/132#issuecomment-643464896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2LLAGLVOAER2XKKTOHKLRWKEFNANCNFSM4NC7UQLA .

squonk11 commented 4 years ago

I checked the sources again. Maybe there could be two weak points:

  1. in UploadResponder::start_of_request() I am using sodium_malloc(). Maybe a stub is needed for this?
  2. In MIMEParser::parse() I define status as static. Maybe it must be a private member variable of MIMEParser?

Are you using the secure or the insecure server for the test?

PerMalmberg commented 4 years ago

I used the insecure server. It's most likely related to the sodium stuff, not sure why though. Adding gdb to the container right now to debug it.

A static local in parse() is no good since it then is is shared between instances.

PerMalmberg commented 4 years ago

First crash solved - sodium_init() must be called prior to using the library.

Another problem that then immediately showed it self - in end_of_request(), the call to free(pState) must be changed to sodium_free().

This clearly shows why it is beneficial to perform most of the work on a Linux system - the debugging facilities are so much better.

I'll be testing this a bit and will push the changes to you branch (if I can).

PerMalmberg commented 4 years ago

I'm done testing now, just waiting for the CI scripts to run locally. I see that you changed it to a local variable. I'll merge that into my local copy and push my changes to this PR.