Open imantung opened 6 years ago
Yes, are you planning to move to this right now? we should do it after MVP is out.
On Wed, 6 Jun 2018 at 14:46, Iman Tunggono notifications@github.com wrote:
Producer using http protocol which is slower than gRPC or UDP and not good for scaling. Need to refactor to gRPC/UDP. cc: @tagnotfound https://github.com/tagnotfound @ajeygore https://github.com/ajeygore
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BaritoLog/barito-flow/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKK_oiW27X_vxhqraGSoNqpRa68jR4hks5t53qugaJpZM4UcEtu .
-- Thanks
Ajey
I'll own this issue.
I would like to ask which side we need to modify to be able to deliver this? Is it the producer only or also the router (that was written by @pushm0v)? I would like to create a PoC and try to benchmark and simulate the performance.
Any pointers? @imantung ?
I designed the protocol for UDP as follows. We have 42-bytes header and variable length for the log payload. I think it's just simple binary encoding.
The main difference with HTTP is that this protocol is fire and forget. If the token bucket is full, the log data is just discarded, no 'response' at all. If the network decides to kill the packet, it get lost too. KCP will mitigate some of these scenarios. The upside is we may have more throughput.
0 4 8
+-----------------------+-----------+
| 'BLRQ' | ulen |
+-----------------------------------+ 8
| TIMESTAMP (64-bit) |
+-----------------------------------+ 16
| |
+ + 24
| SIGNATURE |
+ + 32
| |
+ + 40
| |
+-----------------------------------+ 48
| |
| LOG DATA (variable length) |
| |
The length sent is the byte length not the string length as they can differ.
BLRQ
is the packet header, "Barito Log ReQuest".ulen
is the byte length of the data.SIGNATURE
is 256-bit long, enough for SHA-256. Signature is implementation-defined.LOG DATA
is the log text and whatever you want to attach. Started at offset 48.As log usually includes a lot of text, we'll need some form of compression to reduce network bandwidth. Compression is done in protocol level using Google Snappy.
The protocol/connection is multiplexed together using smux. This includes token bucket for throttling so we don't DDoS ourselves.
Hi @lynxluna thanks for this.
I would like to ask which side we need to modify to be able to deliver this? Is it the producer only or also the router
supposed to be from client -> router -> producer
but we can start in router -> producer
By the was seeing that we're going to go live soon, can we provide options when starting the server so http is still supported
@giosakti HTTP will always be supported. @imantung wrote the service as interfaces. So what I am doing is just implementing (and refactor/segregate) the service interface(s). It won't change any current implementation.
very nice!
Last Status: Problem with logs stream packet size
The plain UDP stream working but because it has MTU, we'd need to manage that. Logs tend to be very big so it will be chopped to > 1 packet. The typical MTU for UDP is 576 bytes, and 508 bytes is the safe bet to send logs in 1 packet. KCP mitigate this and use 1400 bytes MTU (it's adjustable), by rearranging the UDP packets, but it's still not enough for sending logs. We can increase MTU but the latency will be high. KCP also don't implements SYN/FIN like TCP so it there maybe pa premature 'connection closing' before the producer even able to read the whole logs.
We'd need to use multiplexing to get full throughput and able to accommodate big logs going through. There are yamux and smux. I'll evaluate both and see which one is better suited.
Producer using http protocol which is slower than gRPC or UDP and not good for scaling. Need to refactor to gRPC/UDP. cc: @tagnotfound @ajeygore