anacrolix / torrent

Full-featured BitTorrent client package and utilities
Mozilla Public License 2.0
5.48k stars 620 forks source link

New Cluster Nodes Unable to Contact Webseed in TLS-Enabled Cluster Setup #964

Open abhishek-das-gupta opened 1 month ago

abhishek-das-gupta commented 1 month ago

Overview

Adding new hosts within a cluster with TLS enabled is problematic due to a prerequisite that new nodes should have a 14 GB file distributed using the BitTorrent client running on these hosts. This torrent process is stuck indefinitely.

Architecture

Cluster Architecture

Within our cluster, we have a master node and worker nodes that report the cluster's state to the master. The master generates the .torrent file, which is a trackerless torrent file. The master somewhat acts as a tracker, providing each peer with information about other peers to communicate with during torrenting.

Torrent Architecture

Torrent Process During Fresh Cluster Install

This is the process followed during a fresh cluster setup:

Scenarios with New Host(s) Addition

Without TLS Enabled on the Existing Cluster

With TLS Enabled in the Existing Cluster

Case #1: Libtorrent Client Process Runs on the New Hosts

The 14 GB file gets distributed within a few minutes.

Case #2: Anacrolix/Torrent Process Runs on the New Hosts

The 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:

webseeds:  <--- no web seed
2 peer conns:
- 10.140.93.137:51680-10.140.40.8:7191
  peer id: "-GT0003-\xb3.\x9epQ\xd6LG\x03\xad\xce8"
  extensions: 0000000000100005 (ltep, fast, dht)
  ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
  pex: 2 conns, 0 unsent events
  bep40-prio: e8a31f71
  last msg: 26.36s ago, connected: 86.37s ago, last helpful: never, itime: 0s, etime: 0s
  0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :M,e,v1:, dr: 0.0 KiB/s
  requested pieces:
- 10.140.93.137:7191-10.140.24.8:43468
  peer id: "-GT0003-\xfc\x93{w:\x94~\x8f\x13\u0671\x1b"
  extensions: 0000000000100005 (ltep, fast, dht)
  ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
  pex: 2 conns, 0 unsent events
  bep40-prio: d766eef0
  last msg: 86.29s ago, connected: 86.29s ago, last helpful: never, itime: 0s, etime: 0s
  0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :I,e,v1:, dr: 0.0 KiB/s
  requested pieces:

Hi @anacrolix, Can you please provide pointers on why this API: http://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download> is not reachable from peer to the web seed present?

anacrolix commented 1 month ago

At a guess the webseed URL doesn't conform to the BEP for a multi file torrent. Make sure it's a single file torrent if you're going to specify a URL to a single file. You could also put a panic in where it's closing the webseed to find out it's reasoning.

abhishek-das-gupta commented 1 month ago

At a guess the webseed URL doesn't conform to the BEP for a multi file torrent. Make sure it's a single file torrent if you're going to specify a URL to a single file

It is a single file of size 14GB. To be more accurate it is "gzip compressed data" of 14GB.

You could also put a panic in where it's closing the webseed to find out it's reasoning. Can you please provide more info on this. Where should I add more logs?

anacrolix commented 1 month ago

Could you provide the metainfo here?

I'll get back to you on the close thing tomorrow.

abhishek-das-gupta commented 1 month ago

Here is the metainfo (.torrent)

Torrent name: <some-parcel>.parcel
Announced at: Seems to be trackerless
Created on..: Mon Aug 05 12:06:49 UTC 2024
Created by..: cm-server
Pieces......: 1669 piece(s) (8388608 byte(s)/piece)
Total size..: 13,997,539,212 byte(s)
anacrolix commented 1 month ago

Feel free to email it to me. Specifically I want to check the structure of the internal fields as that affects how webseeding works.

abhishek-das-gupta commented 1 month ago

Thanks! Sending you. One thing though:

There is fallback logic that if the parcel download doesn't complete via BitTorrent in a certain time, the fallback mechanism is to do an HTTP download from the web seed.

Once this timeout occurs, then t.AddWebSeed() API gets called with the url as http://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>

anacrolix commented 1 month ago

The info checks out (it is a single file, but the URL should also be fine).

I need to find out why the webseed peer is being closed. There should only be two ways: It's banned, or the torrent closes.

There should be copious logging calling out why, or you can put a panic here: https://github.com/anacrolix/torrent/blob/33e0ed521d973c6567b204fce09477157aa6f238/webseed-peer.go#L149.

anacrolix commented 1 month ago

There is a sort of integration test in a semi-formed state that could help with this once we have a better reason.

abhishek-das-gupta commented 1 month ago

I need to determine why the webseed peer is being closed.

I'm not sure if you're looking at the correct case I mentioned. Apologies for any confusion. Here is the issue more clearly explained (copied from post above):

Case: Anacrolix/Torrent Process Runs on the New Hosts with existing cluster having TLS

The 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:

webseeds:  <--- no web seed
2 peer conns:
- 10.140.93.137:51680-10.140.40.8:7191
  peer id: "-GT0003-\xb3.\x9epQ\xd6LG\x03\xad\xce8"
  extensions: 0000000000100005 (ltep, fast, dht)
  ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
  pex: 2 conns, 0 unsent events
  bep40-prio: e8a31f71
  last msg: 26.36s ago, connected: 86.37s ago, last helpful: never, itime: 0s, etime: 0s
  0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :M,e,v1:, dr: 0.0 KiB/s
  requested pieces:
- 10.140.93.137:7191-10.140.24.8:43468
  peer id: "-GT0003-\xfc\x93{w:\x94~\x8f\x13\u0671\x1b"
  extensions: 0000000000100005 (ltep, fast, dht)
  ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
  pex: 2 conns, 0 unsent events
  bep40-prio: d766eef0
  last msg: 86.29s ago, connected: 86.29s ago, last helpful: never, itime: 0s, etime: 0s
  0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :I,e,v1:, dr: 0.0 KiB/s
  requested pieces:

I believe you might be looking at the wrong case. The CLOSED status shown below indicates that torrenting through Anacrolix completed successfully. I captured this full-status after the torrent process finished.

If Anacrolix/Torrent is used, during torrenting of the 14 GB parcel, these peers have web seed information in their statuses: webseeds:

My main issue is why when TLS is enabled webseed section remains empty. In the master node(webseed)'s logs, I do not see any of these new peers contacting it.

anacrolix commented 1 month ago

I don't quite follow. If they're not able to contact the webseed, there should be errors generated telling you why.

abhishek-das-gupta commented 1 month ago

hi @anacrolix, Unfortunately, I'm not seeing any obvious logs from either the Anacrolix/torrent library (client) or the master node's logs (acting as the webseed server) that indicate a download request from the client, such as:

2024-08-09 05:59:03,201 INFO ParcelController: Parcel download request: <some-parcel> from: <web-seed-client>

For the webseed client, I've already configured it to skip server certificate verification during the torrent client setup using the cfg.WebTransport configuration:

config.WebTransport = &http.Transport{
    TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}

This configuration works when I download the torrent file directly from the master node using a similar API (https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent). Here’s how the download of the .torrent file is handled from the master node:

url = https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent
client := http.DefaultClient
if se.configs.AllowInsecureCerts {
    client = &http.Client{
        Transport: &http.Transport{
            TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
        },
    }
}
resp, err := client.Get(url)

This request results in a 200 response, successfully downloading the .torrent file.

However, when I provide a similar API/URL while adding the webseed (https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>) with cfg.WebTransport set, the download gets stuck. Note that I plan to implement TLS during BitTorrent in the future, but this isn't currently on the roadmap.

Could you please help me understand why the torrent file downloads successfully from the master node, but the webseed client fails to download from the master node when using the similar API and both having the InsecureSkipVerify: true set ? Is there another configuration I need to pass? Or is this a bug?

anacrolix commented 1 month ago

There's no reason TLS shouldn't work, I've had it work before with webseeding in production scenarios. I think if there's a bug it's that you're not seeing helpful log messages. I don't have much time to allocate to this at the moment but the webseed code isn't lengthy and some tracing through to find where things are going wrong might be worthwhile.

anacrolix commented 1 month ago

I'm not sure WebTransport is the correct config item, unfortunately there are quite a few of them due to slight variations in how http is consumed in BitTorrent that I haven't been able to merge. However as above you should be seeing a reason for it not working so just fixing that isn't productive for the project at least.

gatisahu commented 4 weeks ago

I am also using tls config through WebTransport, it is able connect and send request ,but after some time I am seeing below error and getting status as

Status : webseeds:

Error :

banning webseed peer for "https://######/parcel/download/some.parcel" for being sole dirtier of piece 6 after failed piece check  [ github.com/anacrolix/torrent   torrent.go:2458 ]
anacrolix commented 3 weeks ago

Okay, as above being banned would make sense. Is it possible your http server does not implement range requests or is serving incorrect or incomplete data?

gatisahu commented 3 weeks ago

Yes we have added response.addHeader("Accept-Ranges", "bytes");

One more thing I have observed is when we add webseed peer and call download then it starts downloading . If we put 2/3 min gap and add webseed it did not start .I have put a torrent.AddWebSeedsOpt to trace in AddWebSeeds, I see torrent is not sending request to server .

anacrolix commented 3 weeks ago

Great. It's very likely missing a "tickle" for webseed peers if reader priorities have already been set. I should be able to statically verify that.

gatisahu commented 3 weeks ago

I am also seeing error error running handshook conn: main read loop: decoding message: reading message length: EOF I think webseed may not used below config config.WebTransport = &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }

abhishek-das-gupta commented 3 weeks ago

Hi @anacrolix, which configuration should be set to true to enable "Local Service Discovery"? I want to ensure cross-rack communication is possible.

anacrolix commented 3 weeks ago

I've not implemented this yet. https://github.com/anacrolix/torrent/issues/248.

abhishek-das-gupta commented 2 weeks ago

Thanks for the update, @anacrolix. I have another question based on this.

In my public cloud environment, nodes are spread across different AZs/racks within a VPC network. The security group for this VPC network allows "All traffic" (all protocols, all ports) from all sources (0.0.0.0/0), which should mean that the torrent port 7191 (in my case) is open for communication across racks in the VPC network. However, when I attempted to start a connection between two nodes located in different racks, the connection was reset or closed every time.

[root@e2e-56716943-456-dl-gateway0 user]# telnet 10.80.221.8 7191
Trying 10.80.221.8...
Connected to 10.80.221.8.
Escape character is '^]'.
Connection closed by foreign host.

[root@e2e-56716943-456-dl-gateway0 user]# nc 10.80.221.8 7191
Ncat: Connection reset by peer.

The code snippet above shows that when trying to connect to the destination IP via the torrent port, the connection gets reset. Could this be because there is no LSD (Local Service Discovery) implementation within the library, which uses multicast advertisements to enable nodes to discover peers that may be able to help them with their downloads?

anacrolix commented 2 weeks ago

I've pushed fixes to master that should improve webseed performance, and fix the stall that occurs if you add webseeds after adding the torrent (and some delay).

anacrolix commented 2 weeks ago

I am also seeing error error running handshook conn: main read loop: decoding message: reading message length: EOF I think webseed may not used below config config.WebTransport = &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }

I've checked this, you are setting it in the correct place.

anacrolix commented 2 weeks ago

Thanks for the update, @anacrolix. I have another question based on this.

In my public cloud environment, nodes are spread across different AZs/racks within a VPC network. The security group for this VPC network allows "All traffic" (all protocols, all ports) from all sources (0.0.0.0/0), which should mean that the torrent port 7191 (in my case) is open for communication across racks in the VPC network. However, when I attempted to start a connection between two nodes located in different racks, the connection was reset or closed every time.

[root@e2e-56716943-456-dl-gateway0 user]# telnet 10.80.221.8 7191
Trying 10.80.221.8...
Connected to 10.80.221.8.
Escape character is '^]'.
Connection closed by foreign host.

[root@e2e-56716943-456-dl-gateway0 user]# nc 10.80.221.8 7191
Ncat: Connection reset by peer.

The code snippet above shows that when trying to connect to the destination IP via the torrent port, the connection gets reset. Could this be because there is no LSD (Local Service Discovery) implementation within the library, which uses multicast advertisements to enable nodes to discover peers that may be able to help them with their downloads?

This may be due to automatic blocking of internal IPs in the client. It won't be anything to do with the lack of LSD.

anacrolix commented 2 weeks ago

hi @anacrolix, Unfortunately, I'm not seeing any obvious logs from either the Anacrolix/torrent library (client) or the master node's logs (acting as the webseed server) that indicate a download request from the client, such as:

Can you try running master with GO_LOG=webseed=all?

For the webseed client, I've already configured it to skip server certificate verification during the torrent client setup using the cfg.WebTransport configuration:

config.WebTransport = &http.Transport{
    TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}

This configuration works when I download the torrent file directly from the master node using a similar API (https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent). Here’s how the download of the .torrent file is handled from the master node:

url = https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent
client := http.DefaultClient
if se.configs.AllowInsecureCerts {
    client = &http.Client{
        Transport: &http.Transport{
            TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
        },
    }
}
resp, err := client.Get(url)

This request results in a 200 response, successfully downloading the .torrent file.

However, when I provide a similar API/URL while adding the webseed (https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>) with cfg.WebTransport set, the download gets stuck. Note that I plan to implement TLS during BitTorrent in the future, but this isn't currently on the roadmap.

Could you please help me understand why the torrent file downloads successfully from the master node, but the webseed client fails to download from the master node when using the similar API and both having the InsecureSkipVerify: true set ? Is there another configuration I need to pass? Or is this a bug?

Maybe take a look at https://github.com/anacrolix/torrent/blob/f4711825e84e3c24fa96d127098ed5933235029d/client.go#L220-L228. I still don't see why you wouldn't get errors though, although the recent fixes may explain that. Let me know how the logging and fixes mentioned at the top of this comment go.