BaritoLog / barito-flow

Handling logs transportation within cluster
MIT License
5 stars 11 forks source link

Move away from http #4

Open imantung opened 6 years ago

imantung commented 6 years ago

Producer using http protocol which is slower than gRPC or UDP and not good for scaling. Need to refactor to gRPC/UDP. cc: @tagnotfound @ajeygore

ajeygore commented 6 years ago

Yes, are you planning to move to this right now? we should do it after MVP is out.

On Wed, 6 Jun 2018 at 14:46, Iman Tunggono notifications@github.com wrote:

Producer using http protocol which is slower than gRPC or UDP and not good for scaling. Need to refactor to gRPC/UDP. cc: @tagnotfound https://github.com/tagnotfound @ajeygore https://github.com/ajeygore

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BaritoLog/barito-flow/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKK_oiW27X_vxhqraGSoNqpRa68jR4hks5t53qugaJpZM4UcEtu .

-- Thanks

Ajey

lynxluna commented 6 years ago

I'll own this issue.

I would like to ask which side we need to modify to be able to deliver this? Is it the producer only or also the router (that was written by @pushm0v)? I would like to create a PoC and try to benchmark and simulate the performance.

Any pointers? @imantung ?

lynxluna commented 6 years ago

I designed the protocol for UDP as follows. We have 42-bytes header and variable length for the log payload. I think it's just simple binary encoding.

The main difference with HTTP is that this protocol is fire and forget. If the token bucket is full, the log data is just discarded, no 'response' at all. If the network decides to kill the packet, it get lost too. KCP will mitigate some of these scenarios. The upside is we may have more throughput.

0                       4           8
+-----------------------+-----------+
| 'BLRQ'                | ulen      |
+-----------------------------------+ 8
| TIMESTAMP (64-bit)                |
+-----------------------------------+ 16
|                                   |
+                                   + 24
|  SIGNATURE                        |
+                                   + 32 
|                                   |
+                                   + 40
|                                   |
+-----------------------------------+ 48
|                                   |
|     LOG DATA (variable length)    |
|                                   |

The length sent is the byte length not the string length as they can differ.

Compression

As log usually includes a lot of text, we'll need some form of compression to reduce network bandwidth. Compression is done in protocol level using Google Snappy.

Optimisation

The protocol/connection is multiplexed together using smux. This includes token bucket for throttling so we don't DDoS ourselves.

giosakti commented 6 years ago

Hi @lynxluna thanks for this.

I would like to ask which side we need to modify to be able to deliver this? Is it the producer only or also the router

supposed to be from client -> router -> producer but we can start in router -> producer

By the was seeing that we're going to go live soon, can we provide options when starting the server so http is still supported

lynxluna commented 6 years ago

@giosakti HTTP will always be supported. @imantung wrote the service as interfaces. So what I am doing is just implementing (and refactor/segregate) the service interface(s). It won't change any current implementation.

giosakti commented 6 years ago

very nice!

lynxluna commented 6 years ago

Last Status: Problem with logs stream packet size

Discovered Problem

The plain UDP stream working but because it has MTU, we'd need to manage that. Logs tend to be very big so it will be chopped to > 1 packet. The typical MTU for UDP is 576 bytes, and 508 bytes is the safe bet to send logs in 1 packet. KCP mitigate this and use 1400 bytes MTU (it's adjustable), by rearranging the UDP packets, but it's still not enough for sending logs. We can increase MTU but the latency will be high. KCP also don't implements SYN/FIN like TCP so it there maybe pa premature 'connection closing' before the producer even able to read the whole logs.

The Solution

We'd need to use multiplexing to get full throughput and able to accommodate big logs going through. There are yamux and smux. I'll evaluate both and see which one is better suited.