anacrolix / dht

dht is used by anacrolix/torrent, and is intended for use as a library in other projects both torrent related and otherwise
Mozilla Public License 2.0
309 stars 66 forks source link

WIP: Part one of I2P support #56

Open allhailjarjar opened 2 years ago

allhailjarjar commented 2 years ago

This PR is part 1 of 2 changes required to add I2P support in order to make code reviews manageable. It mostly touches the krpc package with only minor changes outside of it to make sure things aren't broken.

Since the last PR, I was able to simplify the change so no Go generics are needed. Instead, a krpc package level function ( SetNetworkType() ) controls how to treat the bencoded data (IP or I2P destination).

You had a question in the previous PR:

Have you looked at how this stuff works with rport and port from the spec? Perhaps it's worth following up with the I2P developers and check the spec is still up to date.

Yes, that will be part of me next PR so that the PRs aren't massive and easier to review. I'm also in touch with the I2P devs, and the spec on the I2P website is indeed up to date.

This change has a small breaking change for the torrent library where it explicitly references krpc.CompactIPv4NodeAddrs, but that should be a quick matter of replacing the struct name there.

anacrolix commented 2 years ago

Does this fix the encoding at a global level? I.e. the entire process can only use a single size (at a time?)?

allhailjarjar commented 2 years ago

That's right. I couldn't find a more elegant way around it. And I don't see a downside besides the extremely unlikely case where someone would want to run both an I2P and IP based DHT from the same process.

This wouldn't stop someone from running multiple I2P or multiple IP based DHTs from the same process.

anacrolix commented 2 years ago

What about not interpreting the bytes and letting the consumer figure it out? If they know they're running on I2P or IPv4 or something else they will know what to do. If it's very ingrained in the existing code, we could add a special field that just contains the uninterpreted bytes alongside the IPv4 interpreted stuff. The bencode package probably needs extra flexibility like that anyway, not sure.

allhailjarjar commented 2 years ago

What about not interpreting the bytes and letting the consumer figure it out?

By consumer do you mean the consumer of the krpc package or the application using anacrolix/dht? Because the host address information from krpc is used within the dht package. And in order for the dht package to be able to make any kind of communication with remote nodes, the socket will need to work with a net.Addr type which requires the address to already interpreted and turned into a string representation. Please let me know if I misunderstood your question.

we could add a special field that just contains the uninterpreted bytes alongside the IPv4 interpreted stuff. The bencode package probably needs extra flexibility like that anyway, not sure.

Sorry, I'm not sure what you mean by that. The data will need to be interpreted eventually anyway if we want to be able to talk to other nodes over i2p. And if it's needed by dht internally, I'm not sure how the application would handle the interpretation

anacrolix commented 2 years ago

Both. If you left the Nodes field as bytes and let anacrolix/dht/external users consume it as they know how, it will be more general. Yes that might mean that dht.Addr is less useful, or at least can't be produced until the necessary unmarshalling is done. The marshalling/unmarshalling could just be a config option for the server.

allhailjarjar commented 2 years ago

Hi @anacrolix I finally have time to pick up this work again.

I've been thinking about a better way of implementing support for i2p addresses, but haven't been able to come up with anything else without requiring a massive refactor.

If you left the Nodes field as bytes and let anacrolix/dht/external users consume it as they know how, it will be more general.

I'm not sure how to do that without having multiple Msg types. The Return struct can only have one Nodes field with a single type. If the Nodes field is left as a bytes array, then we wouldn't be able to add nodes to it for when we want to send a Return to other nodes. Please let me know if you see a way around this.

What's the main concern about the implementation in this PR? If it is that we can only exclusively use either IP or I2P addresses per process? That's not any worse than the status quo where we can only use IP addresses.

anacrolix commented 1 year ago

Setting the type at a global level isn't feasible. I'm not sure why I2P nodes aren't stored in another field separate to the IP ones?

allhailjarjar commented 1 year ago

Most likely it's this way so some misconfigured torrent software doesn't run on both clearnet and i2p simultaneously. That will immediately reveal the user's i2p address to be the associated with their IP address.

anacrolix commented 1 year ago

I need to read more about I2P, I've encountered it a bit since your last comment, it sounds like a big improvement over Tor.