Open abhishek-das-gupta opened 3 months ago
At a guess the webseed URL doesn't conform to the BEP for a multi file torrent. Make sure it's a single file torrent if you're going to specify a URL to a single file. You could also put a panic in where it's closing the webseed to find out it's reasoning.
At a guess the webseed URL doesn't conform to the BEP for a multi file torrent. Make sure it's a single file torrent if you're going to specify a URL to a single file
It is a single file of size 14GB. To be more accurate it is "gzip compressed data" of 14GB.
You could also put a panic in where it's closing the webseed to find out it's reasoning. Can you please provide more info on this. Where should I add more logs?
Could you provide the metainfo here?
I'll get back to you on the close thing tomorrow.
Here is the metainfo (.torrent)
Torrent name: <some-parcel>.parcel
Announced at: Seems to be trackerless
Created on..: Mon Aug 05 12:06:49 UTC 2024
Created by..: cm-server
Pieces......: 1669 piece(s) (8388608 byte(s)/piece)
Total size..: 13,997,539,212 byte(s)
Feel free to email it to me. Specifically I want to check the structure of the internal fields as that affects how webseeding works.
Thanks! Sending you. One thing though:
There is fallback logic that if the parcel download doesn't complete via BitTorrent in a certain time, the fallback mechanism is to do an HTTP download from the web seed.
Once this timeout occurs, then t.AddWebSeed()
API gets called with the url as http://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
The info checks out (it is a single file, but the URL should also be fine).
I need to find out why the webseed peer is being closed. There should only be two ways: It's banned, or the torrent closes.
There should be copious logging calling out why, or you can put a panic here: https://github.com/anacrolix/torrent/blob/33e0ed521d973c6567b204fce09477157aa6f238/webseed-peer.go#L149.
There is a sort of integration test in a semi-formed state that could help with this once we have a better reason.
I need to determine why the webseed peer is being closed.
I'm not sure if you're looking at the correct case I mentioned. Apologies for any confusion. Here is the issue more clearly explained (copied from post above):
The 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:
webseeds: <--- no web seed
2 peer conns:
- 10.140.93.137:51680-10.140.40.8:7191
peer id: "-GT0003-\xb3.\x9epQ\xd6LG\x03\xad\xce8"
extensions: 0000000000100005 (ltep, fast, dht)
ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
pex: 2 conns, 0 unsent events
bep40-prio: e8a31f71
last msg: 26.36s ago, connected: 86.37s ago, last helpful: never, itime: 0s, etime: 0s
0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :M,e,v1:, dr: 0.0 KiB/s
requested pieces:
- 10.140.93.137:7191-10.140.24.8:43468
peer id: "-GT0003-\xfc\x93{w:\x94~\x8f\x13\u0671\x1b"
extensions: 0000000000100005 (ltep, fast, dht)
ltep extensions: map[ut_holepunch:2 ut_metadata:1 ut_pex:3]
pex: 2 conns, 0 unsent events
bep40-prio: d766eef0
last msg: 86.29s ago, connected: 86.29s ago, last helpful: never, itime: 0s, etime: 0s
0/1669 completed, 0 pieces touched, good chunks: 0/0:0 reqq: 0+0/(1/1024):0/1024, flags: :I,e,v1:, dr: 0.0 KiB/s
requested pieces:
I believe you might be looking at the wrong case. The CLOSED status shown below indicates that torrenting through Anacrolix completed successfully. I captured this full-status after the torrent process finished.
If Anacrolix/Torrent is used, during torrenting of the 14 GB parcel, these peers have web seed information in their statuses: webseeds:
- CLOSED: http://ccycloud-1.b-135-no-tls.root.comops.site:7180/cmf/parcel/download/CDH-7.2.18-1.cdh7.2.18.p0.51297892-el8.parcel last unhandled error: never bep40-prio: e97fd7f2 last msg: never, connected: never, last helpful: 147.05s ago, itime: 2m41.004987105s, etime: 13.875250838s 1669/1669 completed, 0 pieces touched, good chunks: 40889/40889:0 reqq: 0+0/(84/128):0/1024, flags: i:WS:, dr: 47132.0 KiB/s requested pieces:
My main issue is why when TLS is enabled webseed section remains empty. In the master node(webseed)'s logs, I do not see any of these new peers contacting it.
I don't quite follow. If they're not able to contact the webseed, there should be errors generated telling you why.
hi @anacrolix, Unfortunately, I'm not seeing any obvious logs from either the Anacrolix/torrent library (client) or the master node's logs (acting as the webseed server) that indicate a download request from the client, such as:
2024-08-09 05:59:03,201 INFO ParcelController: Parcel download request: <some-parcel> from: <web-seed-client>
For the webseed client, I've already configured it to skip server certificate verification during the torrent client setup using the cfg.WebTransport
configuration:
config.WebTransport = &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
This configuration works when I download the torrent file directly from the master node using a similar API (https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
.torrent). Here’s how the download of the .torrent file is handled from the master node:
url = https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent
client := http.DefaultClient
if se.configs.AllowInsecureCerts {
client = &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
}
resp, err := client.Get(url)
This request results in a 200 response, successfully downloading the .torrent file.
However, when I provide a similar API/URL while adding the webseed (https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
) with cfg.WebTransport
set, the download gets stuck. Note that I plan to implement TLS during BitTorrent in the future, but this isn't currently on the roadmap.
Could you please help me understand why the torrent file downloads successfully from the master node, but the webseed client fails to download from the master node when using the similar API and both having the InsecureSkipVerify: true
set ? Is there another configuration I need to pass? Or is this a bug?
There's no reason TLS shouldn't work, I've had it work before with webseeding in production scenarios. I think if there's a bug it's that you're not seeing helpful log messages. I don't have much time to allocate to this at the moment but the webseed code isn't lengthy and some tracing through to find where things are going wrong might be worthwhile.
I'm not sure WebTransport is the correct config item, unfortunately there are quite a few of them due to slight variations in how http is consumed in BitTorrent that I haven't been able to merge. However as above you should be seeing a reason for it not working so just fixing that isn't productive for the project at least.
I am also using tls config through WebTransport, it is able connect and send request ,but after some time I am seeing below error and getting status as
Status : webseeds:
Error :
banning webseed peer for "https://######/parcel/download/some.parcel" for being sole dirtier of piece 6 after failed piece check [ github.com/anacrolix/torrent torrent.go:2458 ]
Okay, as above being banned would make sense. Is it possible your http server does not implement range requests or is serving incorrect or incomplete data?
Yes we have added response.addHeader("Accept-Ranges", "bytes");
One more thing I have observed is when we add webseed peer and call download then it starts downloading . If we put 2/3 min gap and add webseed it did not start .I have put a torrent.AddWebSeedsOpt to trace in AddWebSeeds, I see torrent is not sending request to server .
Great. It's very likely missing a "tickle" for webseed peers if reader priorities have already been set. I should be able to statically verify that.
I am also seeing error error running handshook conn: main read loop: decoding message: reading message length: EOF I think webseed may not used below config config.WebTransport = &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }
Hi @anacrolix, which configuration should be set to true to enable "Local Service Discovery"? I want to ensure cross-rack communication is possible.
I've not implemented this yet. https://github.com/anacrolix/torrent/issues/248.
Thanks for the update, @anacrolix. I have another question based on this.
In my public cloud environment, nodes are spread across different AZs/racks within a VPC network. The security group for this VPC network allows "All traffic" (all protocols, all ports) from all sources (0.0.0.0/0), which should mean that the torrent port 7191 (in my case) is open for communication across racks in the VPC network. However, when I attempted to start a connection between two nodes located in different racks, the connection was reset or closed every time.
[root@e2e-56716943-456-dl-gateway0 user]# telnet 10.80.221.8 7191
Trying 10.80.221.8...
Connected to 10.80.221.8.
Escape character is '^]'.
Connection closed by foreign host.
[root@e2e-56716943-456-dl-gateway0 user]# nc 10.80.221.8 7191
Ncat: Connection reset by peer.
The code snippet above shows that when trying to connect to the destination IP via the torrent port, the connection gets reset. Could this be because there is no LSD (Local Service Discovery) implementation within the library, which uses multicast advertisements to enable nodes to discover peers that may be able to help them with their downloads?
I've pushed fixes to master that should improve webseed performance, and fix the stall that occurs if you add webseeds after adding the torrent (and some delay).
I am also seeing error error running handshook conn: main read loop: decoding message: reading message length: EOF I think webseed may not used below config config.WebTransport = &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }
I've checked this, you are setting it in the correct place.
Thanks for the update, @anacrolix. I have another question based on this.
In my public cloud environment, nodes are spread across different AZs/racks within a VPC network. The security group for this VPC network allows "All traffic" (all protocols, all ports) from all sources (0.0.0.0/0), which should mean that the torrent port 7191 (in my case) is open for communication across racks in the VPC network. However, when I attempted to start a connection between two nodes located in different racks, the connection was reset or closed every time.
[root@e2e-56716943-456-dl-gateway0 user]# telnet 10.80.221.8 7191 Trying 10.80.221.8... Connected to 10.80.221.8. Escape character is '^]'. Connection closed by foreign host. [root@e2e-56716943-456-dl-gateway0 user]# nc 10.80.221.8 7191 Ncat: Connection reset by peer.
The code snippet above shows that when trying to connect to the destination IP via the torrent port, the connection gets reset. Could this be because there is no LSD (Local Service Discovery) implementation within the library, which uses multicast advertisements to enable nodes to discover peers that may be able to help them with their downloads?
This may be due to automatic blocking of internal IPs in the client. It won't be anything to do with the lack of LSD.
hi @anacrolix, Unfortunately, I'm not seeing any obvious logs from either the Anacrolix/torrent library (client) or the master node's logs (acting as the webseed server) that indicate a download request from the client, such as:
Can you try running master
with GO_LOG=webseed=all
?
For the webseed client, I've already configured it to skip server certificate verification during the torrent client setup using the
cfg.WebTransport
configuration:config.WebTransport = &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }
This configuration works when I download the torrent file directly from the master node using a similar API (
https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
.torrent). Here’s how the download of the .torrent file is handled from the master node:url = https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>.torrent client := http.DefaultClient if se.configs.AllowInsecureCerts { client = &http.Client{ Transport: &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }, } } resp, err := client.Get(url)
This request results in a 200 response, successfully downloading the .torrent file.
However, when I provide a similar API/URL while adding the webseed (
https://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
) withcfg.WebTransport
set, the download gets stuck. Note that I plan to implement TLS during BitTorrent in the future, but this isn't currently on the roadmap.Could you please help me understand why the torrent file downloads successfully from the master node, but the webseed client fails to download from the master node when using the similar API and both having the
InsecureSkipVerify: true
set ? Is there another configuration I need to pass? Or is this a bug?
Maybe take a look at https://github.com/anacrolix/torrent/blob/f4711825e84e3c24fa96d127098ed5933235029d/client.go#L220-L228. I still don't see why you wouldn't get errors though, although the recent fixes may explain that. Let me know how the logging and fixes mentioned at the top of this comment go.
Is there any update on this?
Overview
Adding new hosts within a cluster with TLS enabled is problematic due to a prerequisite that new nodes should have a 14 GB file distributed using the BitTorrent client running on these hosts. This torrent process is stuck indefinitely.
Architecture
Cluster Architecture
Within our cluster, we have a master node and worker nodes that report the cluster's state to the master. The master generates the .torrent file, which is a trackerless torrent file. The master somewhat acts as a tracker, providing each peer with information about other peers to communicate with during torrenting.
Torrent Architecture
Torrent Process During Fresh Cluster Install
This is the process followed during a fresh cluster setup:
The master node first downloads (HTTP fetch) the parcel from a remote server, then acts as a web seed.
Each peer gets other peers' information (host IP, port) using the heartbeat response from the master node. Each peer then calls the AddPeers() API from the torrent client. This AddPeers() API call happens after every heartbeat response from the master to the worker peer.
Torrenting starts between the peers (master + workers).
There is fallback logic that if the parcel download doesn't complete via BitTorrent in a certain time, the fallback mechanism is to do an HTTP download from the web seed.
Torrent Process During New Host Addition in Existing Cluster
This is the general flow of how new hosts are added in an existing cluster:
A set of new hosts getting added to the cluster install the Anacrolix/torrent binary and the libtorrent binary. By default, the Anacrolix/torrent client process runs.
These new peers/hosts/nodes contact the web seed (master node) present in the existing cluster (TLS enabled or not) to download the 14 GB file.
Simultaneously, these new peers start distributing pieces of this 14 GB file among each other.
Scenarios with New Host(s) Addition
Without TLS Enabled on the Existing Cluster
With TLS Enabled in the Existing Cluster
Case #1: Libtorrent Client Process Runs on the New Hosts
The 14 GB file gets distributed within a few minutes.
Case #2: Anacrolix/Torrent Process Runs on the New Hosts
The 14 GB file distribution gets stuck on these new nodes because none of the new peers can contact the web seed (master node) present in the existing cluster. In the web seed section of full-status, it is empty:
Hi @anacrolix, Can you please provide pointers on why this API:
http://<master-node>:<TLS-port>/cmf/parcel/download/<file-to-download>
is not reachable from peer to the web seed present?