cholcombe973 / rusix

Distributed filesystem in Rust
Other
54 stars 11 forks source link

Basics: Abstract the transport layer so that TCP could be replaced with RDMA or something else #5

Open cholcombe973 opened 6 years ago

cholcombe973 commented 6 years ago

The current transport layer uses ZeroMQ to shuttle data back and forth. It's currently hard coded to use tcp but it could be swapped out for udp or in_proc. It's not abstract enough however to be useful for RDMA or iscsi.

cholcombe973 commented 5 years ago

I'm thinking of dumping zmq because I can't seem to make a pattern that works correctly for what we're trying to achieve. Basically what needs to happen is something like:

  1. client wants to create a file.
  2. client chunks file using erasure code into M pieces + N redundant pieces.
  3. client initiates request to (M+N) servers
  4. client wants for all servers to respond before responding that the file operation has succeeded.

We could play around with who takes the brunt of the operation penalty ( the client or the server ) but the problem still remains. ZeroMQ just doesn't have a way to create that kind of scatter/gather pattern. It also seems relatively expensive to start up sockets in zmq. I investigated nanomsg. While it looks better in some ways it also suffers from some scaling problems. Nanomsg-ng was created to fix some of those shortcomings but I'm reading bad error reports about latency and slowness . I'm thinking it might be better to go lower level with something like tokio-tcp or mio.
@garypen I remember you mentioning awhile back that networking was more your specialty. Would you be able to help here? Having something where I can say send this payload to these X servers and wait for them all to respond would be really helpful.

garypen commented 5 years ago

@cholcombe973 I've no experience with 0MQ, but it seems like it should be able to support the behaviour you are describing.

For instance, if each client (ZMQ_CLIENT) connects to the various servers (ZMQ_SERVERS) won't that give you exactly the behaviour you describe?

If you are concerned about performance (and that's a legitimate concern) we need to dig into that. What do you mean by "start up" sockets? Are you talking about the time to bind/connect? If so, that shouldn't be a concern as it will only be a rare occurrence in terms of the overall lifetime of the sockets. If you mean, it takes some time before data is transmitted after zmq_send(), then that is more of a concern and would require some investigation.

Let's try and catch up in IRC sometime soon and we can go into this in more detail.

cholcombe973 commented 5 years ago

I thought the same thing but ZMQ has different behavior when you connect it to multiple servers. It starts a round robin for the packets. While that's nice for load balancing it's not exactly the behavior we're after.
Oh by "start up" I mean that when you instantiate a socket it seems to get a long time to become ready for sending data. Yeah I'm referring to the bind/connect phase. I don't think the transmit phase is much of a concern yet.