P2PSP / simulator

A complete stand-alone simulator for the P2PSP protocol
GNU General Public License v3.0
18 stars 15 forks source link

Reed solomon coder-decoder layer #23

Closed vishwasmittal closed 5 years ago

vishwasmittal commented 6 years ago

[Work in progress] Adding an error correction layer to recover any data that is lost during transmission through an erasure channel.

cristobalmedinalopez commented 6 years ago

Hi @vishwas78 What is the purpose of this PR? Is there any issue related? Please, if possible, explain us further why we need a reed solomon encoder-decoder layer. Thanks!

vishwasmittal commented 6 years ago

@cristobalmedinalopez Reed-Solomon layer is an error correction algorithm which is used to correct any noise or missing data problem during transmission through an erasure channel.

This uses error correction codes to encode the k sized data to n (>k) sized expression. If some of the data is missing or changed (due to noise) the original data can be recovered from any combination of k characters from the expression.

This was proposed in the GSoC ideas list so I thought I can start working on it.

cristobalmedinalopez commented 6 years ago

Ok, great! We just want to make sure that our contributors have a reason for working on a specific task, and they understand why and how to proceed. Please, if possible, create an issue for the proposal. We will check out your PR, but we are waiting to merge it till a fully working version is available. Thanks a lot!

vishwasmittal commented 6 years ago

yeah sure :)

vishwasmittal commented 6 years ago

@cristobalmedinalopez @vicente-gonzalez-ruiz I have completed the integration of reed-solomon codec in the DBS. Can you review the changes?

cristobalmedinalopez commented 6 years ago

Ok, let us check it out. @vicente-gonzalez-ruiz

vishwasmittal commented 6 years ago

@cristobalmedinalopez can you please halt the review, I found some bugs in the code while establishing the connections. In the next commit, I will try to remove all the bugs and then will ping you again. :sweat_smile:

vicente-gonzalez-ruiz commented 6 years ago

Hi @vishwas78! Thank you a lot for your PR. I've reviewed your changes in the code and I must say that they are crystal clear. Congratulations!

I have only one question. If I'm not wrong, you have improved all the messages that the noded interchanges among them, even including the TCP ones. My question is. Is the R-S code in this case really necessary? I mean, suppousely, the error correction mechanism implemented by the TCP should be enough for most of the situations where a moderate amount of data is lost. In which sense the use of a R-S code can be justified?

And a (I hope) final request. Could you make that users can select between using or not using the R-S code? For us, the ideal situation should have a flag that when provided at the command line level, would enable the use of the data loss protection mechanism that you have implemented.

Thanks again!

vishwasmittal commented 6 years ago

Thank you very much @vicente-gonzalez-ruiz for the review. Yeah, you are right, I didn't consider NOT encoding the TCP messages, its error correction is enough. And yes I can certainly include a flag/switch for the R-S encoding.

vicente-gonzalez-ruiz commented 6 years ago

Hi, you are online (I'm almost asleep :-) )!

Please, a last question. If a chunk is lost, we can not recover that chunk using a R-S code. However, if we split the chunks into small pieces and create chunks (those that we finally transmit) with the data of several original chunks, interleaving the original pieces, do you think that in this case a lost of a transmitted chunk (that would affect to small parts of several original chunks) could not imply a lost of information? In other words, could the FEC mechanism that you are implementing correct all the original chunks?

vishwasmittal commented 6 years ago

Yeah, I generally sleep late :sweat_smile:

That's a great hypothesis and I think this is possible, as according to the explanation of the algorithm, the original content can be recovered from a part of the data. The length of that part has (at least) to be in a specified ratio with the data which can be decided during encoding.

The main task here is designing a system where we can interleave the chunks and partition them in proper sizes so that every chunk can be regenerated even if some of them is lost. This I can certainly do :)

vicente-gonzalez-ruiz commented 6 years ago

Right.

Forget temporally the splitting and interleaving of the chunk data, and make now only a PR for the use (optional) of the RS codes in simulator.

I would like also that the code of the DBS layer will not depend on this improvement. In other words, the code of DBS would address only (strictly speaking) the problem of transporting data, and the RS processing would be done in a separate class. Do you think that this could be possible? I know, that this can be tricky (at least in Python) :-/ and that you have done very minimal modifications to DBS (I'm happy with this), but this is the maximum level of modification that we could accept.

Thank you, @vishwas78 !

vishwasmittal commented 6 years ago

Sorry for the late response, actually I am caught up in an assignment work right now.

Sure, as you say, I will only add optional RS encoding-decoding feature in this PR.

Yes, separation of RS and DBS is completely possible, this is what I was going for. I know that making internal modifications to such a large code can be tricky to debug that's why I was making a separate module for that that can also be reused like you have done with splitter_stuff.py and yes, no more modification is required in core code :smile:

vicente-gonzalez-ruiz commented 6 years ago

@vishwas78, I have thought a little more in your code and, please, consider that: Is it possible to change the name of your method sendto_encoded() by only sendto() (as originally it was, the same for revc_encoded() and recv(), and sendto_encoded_counter) and that, depending on the imported package/module, the definition of these functions would be different?

vishwasmittal commented 6 years ago

@vicente-gonzalez-ruiz I have added the switch and removed the redundant methods. There is an error for --encode True flag that I will explain in probably 1-2 hours as I have a test right now :sweat_smile:

vicente-gonzalez-ruiz commented 6 years ago

Hi @vishwas78. Thank you for the PR. Unfortunately, we couldn't run your code successfully. This is our output (in a OSX and a Linux):

MacBook-Pro:src vruiz$ python3 simulator.py run --set-of-rules dbs --number-of-monitors 1 --number-of-peers 10 --number-of-rounds 100 --drawing-log ejemplo.draw --encode
2018-04-16 13:52:51,026 - __main__ - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,027 - __main__ - ERROR - Error messages enabled.
2018-04-16 13:52:51,489 - core.splitter_dbs - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,492 - core.splitter_dbs - ERROR - Error messages enabled.
2018-04-16 13:52:51,495 - core.simulator_stuff - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,496 - core.simulator_stuff - ERROR - Error messages enabled.
2018-04-16 13:52:51,508 - core.peer_dbs - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,511 - core.peer_dbs - ERROR - Error messages enabled.
2018-04-16 13:52:51,513 - core.simulator_stuff - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,513 - core.peer_dbs - WARNING - Warning message enabled.
2018-04-16 13:52:51,514 - core.simulator_stuff - ERROR - Error messages enabled.
2018-04-16 13:52:51,515 - core.peer_dbs - INFO - Informative message enabled.
2018-04-16 13:52:51,516 - core.peer_dbs - DEBUG - Low-level debug message enabled.
2018-04-16 13:52:51,519 - core.peer_dbs - INFO - M0: DBS initialized
.
2018-04-16 13:52:51,521 - core.simulator_stuff - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,522 - core.simulator_stuff - ERROR - Error messages enabled.
2018-04-16 13:52:51,525 - core.peer_dbs - INFO - M0: connected to the splitter
2018-04-16 13:52:51,526 - core.simulator_stuff - CRITICAL - Critical messages enabled.
2018-04-16 13:52:51,529 - core.simulator_stuff - ERROR - Error messages enabled.
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/Users/vruiz/.pyenv/versions/3.6.1/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/vruiz/.pyenv/versions/3.6.1/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/vruiz/tmp/simulator/src/core/splitter_dbs.py", line 103, in handle_a_peer_arrival
    self.send_buffer_size(serve_socket)
  File "/Users/vruiz/tmp/simulator/src/core/splitter_dbs.py", line 129, in send_buffer_size
    peer_serve_socket.sendall(msg)
  File "/Users/vruiz/tmp/simulator/src/core/simulator_stuff.py", line 90, in sendall
    return self.sock.sendall(msg)
TypeError: a bytes-like object is required, not 'str'

In Python3, sockets require bytes, not str. Could this be related to this runtime error?

vishwasmittal commented 6 years ago

@vicente-gonzalez-ruiz I have solved the problem described above but I am still facing some problems while encoding-decoding \x00 characters with the existing library. I have tried many things, that you can see from the code that is commented out.

For now, I have commented the code that was responsible for encoding the messages.

In next commit, will try to rectify the problems related with \x00

vishwasmittal commented 6 years ago

On a separate note, in the code, I can see that you have defined a fixed order of messages to be passed, like whenever a peer is connected, buffer size, info about peers, etc. are passed in a fixed order.

I don't know why, but I never trust such (fixed) arrangements of message passing to work correctly whenever parallelism is introduced.

What I propose is using a specific format for messages that can inform the receiver about the type of message that is being passed. For example, you can consider HTTP, it uses various methods (get/post/delete/put/patch) and response codes (1xx/2xx...) to specify different types of requests and responses. We can device our own format for simulation purposes. This will give us much more control over messages that are being passed. We can, then, use callbacks to call the handler functions. This type of mechanism is also used in many frameworks.

What do you think @vicente-gonzalez-ruiz @cristobalmedinalopez?

vicente-gonzalez-ruiz commented 6 years ago

@vishwas78, I agree that the way we use now for transmitting the "meta-data" is not the most flexible. However, it works perfectly, and it more efficient (from a bandwidth point of view) than the procedure you commented.

This does not mean that we can also do what you say (use HTTP as a transport protocol, though, modifying minimally ... if possible, changing only one import ... the current code). Thank you!

vicente-gonzalez-ruiz commented 6 years ago

Hi @vishwas78. These are only some things that should be solved before we accept you PR (thank you for that!). First, only the chunk should be protected by the FEC, and second, we master branch has diverged a few in this time. A final remark ... which is the overhead (in bits/chunk) of the protection?