chr15m / bugout

Back end web app services over WebRTC.
https://chr15m.github.io/bugout
MIT License
611 stars 59 forks source link

hub.bugout.link performance #42

Open draeder opened 3 years ago

draeder commented 3 years ago

Hi Chris:

I've been building a tracker server tester and as part of that I built my own tracker server. I wanted to share some observations that might help you understand the performance issues you've been experiencing with hub.bugout.link.

Short Summary Your tracker server got added to a list of tracker servers that is used widely by users in China who use an app called BitComet. The users in China are very likely killing your server. This issue was opened by someone to have your tracker server added to trackerslist. You can open an issue to have your server removed, too, per the trackerslist readme.

I know this to be the reason for your server's performance issues by reviewing https://hub.bugout.link/stats and seeing that it matches my own server's stats after personally requesting my server be added to trackerslist, in particular the type of client showing up in the stats: BitComet.

Details While testing my tracker server tester, I found that your server often responds with a server 400 bad request. I remember you mentioning that SlingCode introduced extra load to your server, and at the time, that was the explanation for the performance problems.

However, I am starting to see the same kinds of problems with my tracker server, and I don't have any major applications built that uses it yet.

My tracker server was built to run on Heroku since it has a free plan and also a low price plan. After getting things working, I opened an issue with trackerslist to have my tracker server added to the list -- I was curious to see how it would perform with many users. Once it was added, I noticed that my Heroku server was generating a lot of errors and the response time was averaging 30s(!). I've gotten 3.5 million requests to my server and the majority of them are from China users using BitComet.

Since I am using Cloudflare for the CNAME associated with my domain and the Heroku app link, I was able to identify the origin of the traffic with Cloudflare's web analytics. The majority of traffic was coming from China. So, I added a firewall rule to block all traffic from China. This solved the performance issue for about 24-48 hours or so, bringing the response time of my server back to single digit milliseconds. But today it's getting pegged again despite the firewall rule. I have a case open with Cloudflare support to explain why only ~ some ~ traffic from China is getting blocked and other traffic let through.

In the end, I would like to block certain types of clients, or only allow certain types of clients like WebTorrent. It seems that bittorrent-tracker could allow this, since the stats page lists the connected clients, but it's not clear from the documentation how to do that.

I hope you find this information useful.. 🔢

Thanks, Dan

chr15m commented 3 years ago

@draeder thank you so much for this analysis, that is indeed useful information. Will have a bit of a think about how to handle this.

draeder commented 3 years ago

@chr15m I personally want users from a restricted country like China to be able to use my tracker server however they need to, and ideally I wouldn't block any clients.... But the demand from China is so high, it leads to either performance issues or costs. I would love to explore a solution that addresses both....

chr15m commented 3 years ago

@draeder i have designed but not tested a possible solution to this using a "proof-of-work auction" or hashcash auction. Basically you give your server a limited number of slots which it is able to support, and clients have to perform proof-of-work of a sufficient difficulty to connect. This ensures that the server is never overloaded, and also that clients are given a signal about which server they should connect to (lower PoW is better). I've started preliminary work on this, will let you know if/when I make progress. Let me know if you come up with anything yourself.

chr15m commented 3 years ago

@draeder another alternative I've been exploring is to tie cpu load / memory consumption to the PoW value. So as the load goes up and memory becomes scarce, the PoW clients must perform becomes more difficult, warding them off.

draeder commented 3 years ago

@chr15m If you need me to test anything you've created, I can add new servers to Heroku as necessary with little to no cost as long as those servers don't live too long. Just let me know.

What I've been seeing with Heroku though, is CPU and Memory doesn't really increase as the requests increase.. The issue is the number of errors increases as peers get connected together, and that seems to create the increase in time to respond to requests. Specifically, as peers connect, sockets are no longer needed, so the server responds with 503 server unavailable due to the websockets timeout. It's unnecessary use of CPU time to handle/respond to those timeouts.

In Heroku, there's no way to address that... but there must be something that can be done in Node.js.

chr15m commented 3 years ago

@draeder huh, that's interesting. So it sounds more like the sheer number of websockets is the issue. There could also be memory leaks etc. in the server itself. I think I remember @DiegoRBaquero saying their tracker was restarted every 24 hrs, so maybe I should do the same with hub.bugout.link. :thinking:

draeder commented 3 years ago

@chr15m Well that's interesting, because I was talking with the developer of fake-bitttorrent-client about how to address timeout issues for my tracker server tester ... In that issue I found that limiting the number of sockets for the client request helped speed up responses. Looking at it from the other direction may be useful.

draeder commented 3 years ago

Also, there is a limit to dynamically assigned ports that can be opened for any computer using IPV4... https://stackoverflow.com/questions/113224/what-is-the-largest-tcp-ip-network-port-number-allowable-for-ipv4

chr15m commented 3 years ago

@draeder ah yes, and all of these issues point to the benefit of there being more trackers that each handle a smaller load individually.

chr15m commented 3 years ago

More and smaller tracker servers that are part of a mesh also means the whole system is more robust to single individual trackers going offline.

draeder commented 3 years ago

Right... so how to get them participating is the question? I made P2P Tracker so anyone could run a tracker server either locally or in Heroku.. The trouble is, who will run it? My tracker server tester is finding very few responsive trackers. Yours and mine are in the list of working trackers..... unless China is killing our servers.

Your tracker server mesh idea is important, but people need to run their own trackers, first -- then run those tracker servers within the mesh....

chr15m commented 3 years ago

how to get them participating is the question?

I think the only think you can do is put a thing out there and tell people about it. If it is valuable and you have explained the value, people will run it. If not, try again.

hello-smile6 commented 3 years ago

Why not just remove the tracker from the index or lower its position in the index?

hello-smile6 commented 3 years ago

What is this new suicidal application? I saw it yesterday, but didn't think about it.

迅雷在线 (Xunlei) 0.1.0.0 : 51
迅雷在线 (Xunlei) 0.0.1.2 : 12
BitComet 1.73 : 10
BitComet 1.76 : 11
BitComet 1.75 : 22
BitComet 1.74 : 13
BitComet 1.77 : 18
BitComet 0.58 : 1
Transmission 3.00 : 2
Vuze 5.7.6.0 : 3
BitSpirit 3.6.0 : 3
WebTorrent 0.0 : 1

514 active

hello-smile6 commented 3 years ago

Huh. Seems like a badly done P2P CDN. Chinese of origin, commercial.

https://en.wikipedia.org/wiki/Xunlei

hello-smile6 commented 3 years ago

I'll try making a magnet link only using Bugout and see what I get.

hello-smile6 commented 3 years ago

Server works to a point. Managed to get https://instant.io/#magnet:?xt=urn:btih:36c36245e2e7f813efef4d2908ab65920a8dd212&dn=beaker-browser.exe&tr=wss%3A%2F%2Fhub.bugout.link to work, saw myself on the stats page. Server 500 soon after. Seems like server is unstable, @chr15m @draeder . Maybe drop a few connections when one client refuses to seed and has >30 users.

hello-smile6 commented 3 years ago

147 peers. You could try a cron job that uses curl to get the stats and logs them, to keep track. @chr15m

hello-smile6 commented 3 years ago

Fair warning @chr15m Going to try to flood with connections from WebTorrent, try to get it to rebalance.

hello-smile6 commented 3 years ago

You should see a lot of 67...* WebTorrent users now, @chr15m

hello-smile6 commented 3 years ago

Hit 167 during stress testing, nearly wiped out my device. Seems like as soon as I backed off, more peers joined and filled the gap. @chr15m Now at 152

chr15m commented 3 years ago

@hello-smile6 thank you for your testing and data. Under the current scheme where trackers can be flooded at no cost to the user, any mitigation is only going to be a temporary hack. I'm working on a more permanent solution to this problem in my spare time but I do not have anything to show yet. In the meantime, people should run their own trackers if they want better performance.

hello-smile6 commented 3 years ago

Okay. Can you deploy dozens to various free services such as Glitch and Heroku and use DNS for load balancing?

draeder commented 3 years ago

@hello-smile6 That's up to you. I have not had any issues deploying my own tracker server to Heroku. The issues with my server popped up when my server address was available to those who wanted to use it for other reasons than my app.

draeder commented 3 years ago

@hello-smile6 By the way, it just occurred to me that I was working on something similar to what you suggested with DNS. I have the repo set to private since I was doing a bunch of testing with it. Basically it uses Hyperswarm to create a backend server swarm for all trackers for the given app. Then, if you have a domain and host its DNS in Cloudflare, it updates a TXT record with the ws trackers. That gets passed along to all of the server peers. In that way, if any user joins one of the servers, their browser gets the list of servers from the TXT record and the browser seeds all of the servers with its peer address. It's still a work in progress, as I mentioned. But the idea is close to what you were suggesting.

hello-smile6 commented 3 years ago

I'll deploy 2 or 3.

draeder commented 3 years ago

@hello-smile6 Well, I got back to writing this today. I have it nearly complete for a first pass. I'll come back soon and post the repo link. It's called signal-swarm. I could definitely use some testers when its ready.

hello-smile6 commented 3 years ago

@hello-smile6 Well, I got back to writing this today. I have it nearly complete for a first pass. I'll come back soon and post the repo link. It's called signaling-swarm. I could definitely use some testers when its ready.

I will if there's an interface like instant.io I could use for it, @draeder . I've always wanted a better-quality version.

draeder commented 3 years ago

@hello-smile6 It's a tracker server implementation that communicates with other tracker servers based on a shared "topic" or "app name". To participate, you have to have a domain set up in Cloudflare.

hello-smile6 commented 3 years ago

Huh. Could you deploy to Glitch?

draeder commented 3 years ago

Huh. Could you deploy to Glitch?

I don't see why not.. here is the link to the repo: https://github.com/draeder/tracker-swarm. I haven't tested it in Heroku or glitch, so I expect problems. Feel free to let me know about the problems in the repo => issues.

draeder commented 3 years ago

Looks like the Webtorrent devs are working on a performance fix for bittorrent-tracker: https://github.com/webtorrent/bittorrent-tracker/issues/354

chr15m commented 3 years ago

@draeder nice. :+1:

hello-smile6 commented 2 years ago

@chr15m You now have 864 people using the WebTorrent tracker for Bugout, none of which are actually from Bugout. Could you disconnect any client that doesn't seem to be using Bugout?

draeder commented 2 years ago

@hello-smile6 @chr15m I remember seeing an issue for bittorrent-tracker requesting the ability to filter by app id. I tried finding it the other day, but came up short. If such capability was added, it would resolve this issue.

As a side note, I ended up doing quite a bit of experimentation for my own version of decentralizing tracker servers, including trying my hand at building some webtorrent extensions. Ultimately, that led me to Gun, which could theoretically be used to as a distributed tracker server. I have plans to take a stab at that after I finish building some smaller utilities for Gun. Gun is a lot of fun.. @hello-smile6 you should check it out!

hello-smile6 commented 2 years ago

@chr15m Your server's overloaded to the point that it's dropping TCP connections. https://hub.bugout.link/stats

158976 torrents (9915 active)
Connected Peers: 3046
Peers Seeding Only: 584
Peers Leeching Only: 2259
Peers Seeding & Leeching: 203
IPv4 Peers: 3046
IPv6 Peers: 0
Clients:
迅雷在线 (Xunlei) 0.1.0.0 : 1955
迅雷在线 (Xunlei) 0.0.1.2 : 827
迅雷在线 (Xunlei) 0.0.1.1 : 1
WebTorrent Desktop 0.24 : 4
BitComet 1.85 : 28
BitComet 1.73 : 8
BitComet 1.77 : 23
BitComet 1.86 : 63
BitComet 1.83 : 4
BitComet 1.76 : 13
BitComet 1.84 : 22
BitComet 1.82 : 11
BitComet 1.78 : 6
BitComet 1.75 : 7
BitComet 1.81 : 15
BitComet 0.56 : 1
BitComet 1.79 : 6
BitComet 1.87 : 2
BitComet 1.74 : 1
BitComet 0.57 : 2
BitSpirit 3.6.0 : 33
unknown -EZ0100- : 1
unknown -GT0002- : 8
unknown -GT0003- : 2
Vuze 5.7.7.0 : 1
Vuze 5.7.6.0 : 2
draeder commented 2 years ago

I have created tracker-tester to help with identifying responsive trackers. It currently only supports http/https. But I will eventually expand it to ws/wss & udp.

hello-smile6 commented 2 years ago

I have created tracker-tester to help with identifying responsive trackers. It currently only supports http/https. But I will eventually expand it to ws/wss & udp.

Will it work in a web browser?

draeder commented 2 years ago

I have created tracker-tester to help with identifying responsive trackers. It currently only supports http/https. But I will eventually expand it to ws/wss & udp.

Will it work in a web browser?

No, but what I'm building that uses this will eventually work in the browser.. Still testing and working through the details...

draeder commented 2 years ago

Will it work in a web browser?

Hmm.. As I think about this, I think I can make this work in the browser without the other library. Let me see what I can do and I'll post an update here.

draeder commented 2 years ago

Ultimately, what I want to do, and what I have been trying to accomplish now (for years) is to make tracker servers talk to each other on a per-infoHash basis. E.g. if they have an infoHash, they communicate with each other the new peers over that infoHash connection.. Hyperswarm makes it possible. But exchanging the peers has been a challenge. I think I conceptually know how to do this now, but I struggle with implementation .....