System Overview For bip_peer

Issue for tracking what is implemented and what is left for implementing the bip_peer module which will include an API for programmatically queueing up torrent files for download given a MetainfoFile or MagnetLink.

High level overview of the system

untitled drawing 1

Torrent Client Layer

The basic idea is that the TorrentClient communicates with the selection strategy thread over a two way channel. From the client to the strategy thread, we can stop, start, pause, or remove torrents from the download queue. We can also provide configuration options to limit upload/download bandwidth either client wide or on a per torrent basis. From the strategy thread to the client thread we can provide notifications for when torrents are done or if any errors occurred.

Selection Strategy Layer

The selection strategy thread is concerned with sending and receiving high level peer wire protocol messages, initiating peer chokes/unchokes, and deciding what piece to transmit or receive next and from what peer. Each peer is pinned to a channel which is connected to one of potentially many peer protocols, the strategy thread doesn't care what protocol. If a peer disconnects in the protocol layer, a message is sent to the strategy layer alerting it that the peer is no longer connected to us.

Peer Protocol Layer

The peer protocol layer is concerned with reading messages off the wire, deserializing them into peer wire protocol messages heads (variable length data is ignored at this point). Special regions of memory may be set aside for bitfield messages, not sure if we should eat the cost of pre allocating or allocating on demand (they are only sent once per peer so on demand might not be bad).

Disk Manager

The disk manager is what both layers use as an intermediary for sending and receiving pieces. If we determine in the selection strategy layer that we should send a piece to a peer, instead of loading that data in and sending it through the channel to the peer protocol layer, we will ask the disk manager to load in that data if it isn't already in memory. We will then receive a token for that request and send the token down to the peer protocol layer which will tell the disk manager to notify it when the piece has been loaded. It will then be able to access the memory for that piece. For receiving, the peer protocol layer will tell the disk manager to allocate memory for the incoming piece and get notified when it is ready. It will then be able to write the piece directly to that region of memory. I am not sure whether to do checksumming at this point or defer it to the selection strategy layer so that is TBD. After the write occurs, a message will be sent up to the selection strategy thread letting it know what piece it received from what peer.

Notes

This may change as I go about implementation as I want to make it easy to provide HTTP or Socks proxies in the future so I may have to go one layer below the protocol layer for that. At the same time, I want to reduce the number of threads that a TorrentClient requires as currently, just taking into account tcp peers, it will take at least 8 threads (includes 4 worker threads for the disk manager but not including the thread running the user code that is calling into the TorrentClient).

Work Progress

Disk Manager:

[x] Manager Thread
[x] Worker Thread

Handshaker:

[x] TCP Handshaker
[x] UTP Handshaker

Peer Protocol Layer:

[x] Messaging Primitives
[x] TCP Peer Thread
[x] UTP Peer Thread

Selection Strategy Layer:

[ ] Main Thread
[ ] Leecher Strategy
[ ] Super Seeder Strategy
[ ] Streamer Strategy

Torrent Client:

[ ] API

One thing to note, since the selection layer, protocol layer, and disk layer all communicate via SyncSender, it would theoretically be possible to trigger a deadlock between any two of these services.

I was thinking of building this notion of a filled channel in to the layer itself, so if the selection layer sees that a specific peer in the protocol layer has it's channel full, that would act as a bit of flow control telling the selection layer that it should back off.

On the other hand, since the selection layer and disk layer both send to the protocol layer over the same channel, what happens when the selection layer fills up the sender but the disk layer needs to notify the protocol layer that a block is available? That type of information isn't very useful to the disk layer. We may want to divy up space within that shared channel so we can start making guarantees as to what operations can never fail.

Additionally, I think if we were to divy up this space, the selection channels logical capacity should reflect the writes that havent just been sitting in the channel, but which have been queued and not actually written. This would be a more useful metric for the selection layer to be able to see. This would be useful because if the selection layer sent out 5 pieces to a single remote peer, that would correspond to 5 BlockWait messages coming from the disk layer which is the worst case and would never fail if the disk layer was guaranteed 5 message slots to utilize. The cool thing there is the selection layer would see that sending a 6th piece would fail and would not cause a 6th BlockWait message because we allocated only 5 disk message slots.

I am thinking maybe using an atomic shared with the ProtocolSender to signify not the contents of the channel, but how many writes are pending and based off that, determine whether or not an OSelectorMessage can be passed through to the channel. This would require modifying the Sender interface to maybe return a bool indicating if the send was successful or not.

During testing we ran into errors related to migrating a single TcpStream between multiple io completion ports. It looks like doing so is not supported in mio and consequently, rotor and is not even supported on windows before version 8.1.

This means we have to modify our architecture a bit. In essence, we will be composing our peer protocol layer with the handshakers themselves. This actually isn't as bad as I thought it would as each handshaker works over a specific transport anyways, so its not like we are introducing unnecessary coupling. This does mean that the setup curve for a one off consumer of bip_handshaker is a bit steeper.

However, we do benefit from this in that we have less threads operating in our system since both incoming peers and active peers will be operating in the same thread. In addition, we are able to salvage most of the work done in #54 as we already did all of the leg work for the PeerConnection object which can be completely re-used.

Revised high level overview of the system

untitled drawing 1

Instead of handshakers running in their own thread, we have composed them so that we avoid migrating raw sockets/handles between event loops which causes problems on certain operating systems.

In order to spin up a new protocol layer, we create a new BTHandshaker and give it our desired protocol as a type parameter. In our case, to spin up a tcp protocol layer operating over the standard peer wire protocol, we give it WireProtocol<TcpStream>. We then want to gather peers from various sources (dht, pex, trackers, lpd, etc) so we go and clone our BTHandshaker which really just makes a shallow clone of some communication primitives, and we pass the copy into those services. This can be repeated as necessary when we need more peers.

Now, when we find a peer in one of those services, we pass the contact information into the BTHandshaker which sends it to the background state machine being run. A connection is made and a state machine is spawned operating over our handshaker protocol denoted with an H in the above diagram. We go through making sure that we are connecting to a bittorrent peer and both of us are interested in the same InfoHash, standard stuff.

When a handshake completes, the state machine will see that and take our state machine and migrate the protocol from H, handshaking, to C, connected. During this migration, our protocol C, or in this case, the WireProtocol will get created and on creation, have access to our WireContext. Inside our WireContext is everything required to have a state machine register with our selection layer and disk manager.

The connected C state machine will first register with the selection layer through a channel embedded in the WireContext and send a PeerConnect message. This message includes a PeerIdentifier which is a combination of a PeerId and a SocketAddr as well as a channel that the selection layer can use to send commands to the peer which the state machine accepts and writes out to the wire. The selection layer will maintain a mapping of PeerIdentifier -> Channel so that when it decides it wants to send a message to a peer, it knows where to send it. On receiving a message, protocol state machines will continue to use the shared sender channel in the WireContext but all messages include the PeerIdentifier so the selection layer knows who that message came from. If the selection layer wants to initiate a disconnect, or a disconnect occurs in the transport for whatever reason, the protocol state machine will send a PeerDisconnect message to the selection layer and shut its state machine down. At this point, our selection layer can remove the PeerIdentifier mapping and forget about that peer.

DiskManager communication is largely unchanged by the above architecture rework. We will be making sure to put a limit on the amount of pending messages that the selection layer can have queued up for any single protocol layer peer, and this limit will be the same limit that the DiskManager channel to a protocol layer peer will exhibit (+1 for a possible ReserveBlock if we are reading as well? not sure if that can happen atm).

Couple notes, during testing of the disk manager I noticed that it was getting stuck because anti virus software was opening our file handles causing us to be unable to access them. We should expect this to happen/gracefully handle this in the future.

We also saw the the DiskManger was the weakest link as our peer protocol layer was able to quickly pull blocks from peers, but was waiting on allocating blocks which was due to the DiskManager taking longer to process and free blocks.

With the migration to tokio, this architecture may change, though not by much. Closing in favor of a new ticket to track tokio changes.

GGist / bip-rs