Improve Anti-leeching Algorithm

Aster-the-Med-Stu commented 4 years ago

We need to collect traits of illegal clients for building a better anti-leeching algorithm as dicussed in #4192.

So far I have noticed that progress reported from leechers would be stuck at 0% even after receiving gigabytes of data from me.

arvidn commented 4 years ago

sending HAVE messages back to the peers you downloaded the pieces from, or even to any peer that already has those pieces is technically unnecessary and wasteful. there's an option in libtorrent to not do that (and it used to be enabled by default). I don't think it's reasonable to require peers do that.

I think it would be really helpful to have some stronger framing of this problem too. Could you perhaps post one or two scenarios, what's happening now and what you would like to happen.

I assume you think you have peers that deserve your upload bandwidth more than others. What do those "better" peers do to make you feel that way. Or whatever your motivation is, it would be helpful to know where you're coming from.

arvidn commented 4 years ago

for anyone interested in participating in this effort, here's some prior work:

https://qed.usc.edu/papers/ChowGM08.pdf

https://www.researchgate.net/publication/271547138_Free-rider_detection_and_punishment_in_BitTorrent_based_P2P_networks

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.604.5413&rep=rep1&type=pdf

FranciscoPombal commented 4 years ago

I assume you think you have peers that deserve your upload bandwidth more than others. What do those "better" peers do to make you feel that way. Or whatever your motivation is, it would be helpful to know where you're coming from.

I think "being honest about progress percentage" is a good start. Perhaps peers could be punished for reporting progress percentages that are clearly off, beyond some margin of error.

The punishment could be reducing or even cutting off upload to those bad peers entirely. The question then becomes: do we allow a previously banned peer back in? And on what conditions?

arvidn commented 4 years ago

I think "being honest about progress percentage" is a good start.

Could you explain your rationale for thinking that? Specifically, as a seed, how does it benefit you to waste your download capacity by receiving HAVE messages from peers?

(I can think of a few reasons, but they're all fringe).

Also, it's not obvious how enforcing this would help. If you're a leecher you just send HAVE messages to whoever you downloaded the piece from. That still leaves us at square one.

FranciscoPombal commented 4 years ago

I think "being honest about progress percentage" is a good start.

Could you explain your rationale for thinking that? Specifically, as a seed, how does it benefit you to waste your download capacity by receiving HAVE messages from peers?

(I can think of a few reasons, but they're all fringe).

The OP wants to prioritize their seeding capacity to the peers that are actually in the beginning of the download. In this case, the issue is that those bad peers are reporting 0% progress, when everyone else reports something reasonable. This subverts the anti-leech algorithm, that they are using.

The hit to your download capacity might not be worth it to everyone. But it is probably worth it to those specifically using the anti-leech algorithm, to ensure their upload capacity is spent on the peers that need it most.

Also, it's not obvious how enforcing this would help. If you're a leecher you just send HAVE messages to whoever you downloaded the piece from. That still leaves us at square one.

Enforcement could be: as a seeder, if using the anti-leech algorithm, require that peers send HAVE messages somewhat regularly. Additionally, go to some lengths to ensure those messages are realistic, based on what has already been uploaded to that peer.

FranciscoPombal commented 4 years ago

sending HAVE messages back to the peers you downloaded the pieces from, or even to any peer that already has those pieces is technically unnecessary and wasteful. there's an option in libtorrent to not do that (and it used to be enabled by default). I don't think it's reasonable to require peers do that.

I see. However, from my experience, most peers in public swarms report realistic progress percentages in a reasonable time, except those using some variant of the Xunlei client. This client is infamous for using such tactics to cheat and take advantage of other clients in general.

arvidn commented 4 years ago

The OP wants to prioritize their seeding capacity to the peers that are actually in the beginning of the download. In this case, the issue is that those bad peers are reporting 0% progress, when everyone else reports something reasonable. This subverts the anti-leech algorithm, that they are using.

The anti-leech algorithm was updated to use the maximum of "the number of pieces sent to a peer" and "the number of pieces the peer reports to have". Which I believe is exactly what you would end up with anyway, with your proposal. Since you can only enforce receiving HAVE messages for pieces you have sent to a peer. And those pieces are already taken into account by the anti-leech algorithm.

from my experience, most peers in public swarms report realistic progress percentages in a reasonable time, except those using some variant of the Xunlei client. This client is infamous for using such tactics to cheat and take advantage of other clients in general.

Assuming there are people still developing Xunlei, it would be trivial for them to circumvent this enforcement, by simply sending HAVE messages only to peers that sent them the piece. For that reason, I don't consider enforcing it effective, it only has a cost in that it requires clients to behave in certain (undocumented) way, and no upside.

Aster-the-Med-Stu commented 4 years ago

Besides offering the ability to ban peers based on client name is still needed. It is known that newer versions of Thunder would only reserve uploading for other Thunder clients. (A discussion on Thunder's behaviour (in Chinese)) There shall be some way to distinguish other Thunder clients from "normal" BitTorrent clients, and the most suspicous one is peer client name. They all have -XL0012-.

arvidn commented 4 years ago

Besides offering the ability to ban peers based on client name is still needed.

This would also not be effective, since it would be trivial for the client to just send a different name. We're back to square one and with an objectively worse network. Where clients are required to behave in some undocumented manner in order to work.

Am I missing something?

mr-cn commented 4 years ago

See qbittorrent/qBittorrent#10258 and arvidn/libtorrent#3833

FranciscoPombal commented 4 years ago

The anti-leech algorithm was updated to use the maximum of "the number of pieces sent to a peer" and "the number of pieces the peer reports to have". Which I believe is exactly what you would end up with anyway, with your proposal. Since you can only enforce receiving HAVE messages for pieces you have sent to a peer. And those pieces are already taken into account by the anti-leech algorithm.

Assuming there are people still developing Xunlei, it would be trivial for them to circumvent this enforcement, by simply sending HAVE messages only to peers that sent them the piece. For that reason, I don't consider enforcing it effective, it only has a cost in that it requires clients to behave in certain (undocumented) way, and no upside.

If that is how the anti-leech algorithm works now, I guess that is good enough. Perhaps on the client side, qBittorrent could have a way of telling the user which peers are getting throttled due to anti-leech?

arvidn commented 4 years ago

If that is how the anti-leech algorithm works now, I guess that is good enough. Perhaps on the client side, qBittorrent could have a way of telling the user which peers are getting throttled due to anti-leech?

That's not trivial, as that information isn't recorded anywhere. It's just up to the choking algorithm to decide which peers are unchoked.

arvidn commented 4 years ago

One really important thing to remember about choke algorithms (and pretty fundamental to how bittorrent works) is that its goal is to always saturate your upload capacity. The job of the choker is to decide which peers the upload capacity should be allocated to.

If you have a bunch of xunlei clients that are free-riders, but you don't have any other peers to unchoke, there isn't really anything the choker can do. I assume the scenario @Aster-the-Med-Stu is talking about is that there are "good" peers that are choked a lot, or generally get inferior upload capacity, and "bad" peers that get most upload capacity.

So, the scenario where you're a seed and almost all clients connected to you are Xunlei is not considered, right?

My understanding is that the true property of a "good" peer is that it has a non-trivial amount of upload capacity and it's using it to redistribute pieces it downloads to other peers. and "bad" peers do not redistribute the pieces they download. How can we detect bad peers?

Would even Thunder/Xunlei be considered a bad peer as long as it uploads, even if it's just to other xunlei peers?

FranciscoPombal commented 4 years ago

If you have a bunch of xunlei clients that are free-riders, but you don't have any other peers to unchoke, there isn't really anything the choker can do. I assume the scenario @Aster-the-Med-Stu is talking about is that there are "good" peers that are choked a lot, or generally get inferior upload capacity, and "bad" peers that get most upload capacity.

I assume so as well.

So, the scenario where you're a seed and almost all clients connected to you are Xunlei is not considered, right?

I don't think so either.

My understanding is that the true property of a "good" peer is that it has a non-trivial amount of upload capacity and it's using it to redistribute pieces it downloads to other peers. and "bad" peers do not redistribute the pieces they download. How can we detect bad peers?

Without significant changes/additions to the protocol, I don't think anything else can be done other than what is already being done. If a peer is always freeriding, and all seeders are anti-leech, then at most that bad peer will only be able to abuse each seeder for a limited time. After a while, each seeder will choke it due to bad reported progress percentage. Dishonest use of the swarm's overall upload capacity still happens, but I don't think this can be stopped altogether without having the clients gossiping about "bad" peers amongst each other.

That's not trivial, as that information isn't recorded anywhere. It's just up to the choking algorithm to decide which peers are unchoked.

So there is no place where the algorithm detects "hey this peer's reported progress doesn't make sense so it won't get unchoked"? Sorry, didn't really look at the code.

Would even Thunder/Xunlei be considered a bad peer as long as it uploads, even if it's just to other xunlei peers?

I'd say yes. This kind of "client tribablism" is a violation/abuse of the spec. However, it is of course not possible to know whether or not some peer is only uploading to other peers with some specific client.

FranciscoPombal commented 4 years ago

@Aster-the-Med-Stu since https://github.com/arvidn/libtorrent/pull/3833 is already in effect, I believe libtorrent is already doing what you want, as you stated originally in the OP. However, it may not be obvious that the dishonest peers are being choked simply be checking that section of qBittorrent's GUI that you posted a screenshot of. It is happening behind-the-scenes.

arvidn commented 4 years ago

I'd say yes. This kind of "client tribablism" is a violation/abuse of the spec. However, it is of course not possible to know whether or not some peer is only uploading to other peers with some specific client.

Sure, it's bad behavior. But even in a world of full information about the swarm, it may be extremely difficult to determine that a peer is engaging in this. Especially of most peers are part of the "tribe".

And I think any false positive in anti-leech logic is very risky for performance.

arvidn commented 4 years ago

here's another reference: https://www.usenix.org/legacy/event/nsdi08/tech/full_papers/piatek/piatek_html/index.html

ghbplayer commented 4 years ago

There are a few things left to say on the subject.

The old super-seeder algorithm is resistant to this form of leeching in small swarms
Presumably the peer download information is retained across re-connects, otherwise it would be easy to game the choker.
It would be interesting to re-connect to suspect peers to see if they have the pieces they downloaded from you. Probably not practical unless the leeching is a major problem
If a choice must be made between punishing bad peers and saturating bandwidth, punishment should always be the priority

BT was always a game-theory algorithm. In game theory, it makes sense to do extra work to detect cheaters once cheating hits a certain threshold. This leech client probably only works on trackers without accounts and cheat detection scripts. If cheating is a big problem, users would likely be better off banning IP blocks known to cause trouble.

The biggest problem with LT engaging in an arms race is that there's a long delay between adding a feature in LT and end-users getting the update. The cheaters can be much more agile and it's not a fair fight.

Aster-the-Med-Stu commented 4 years ago

So to sum up the discussion, instead of intergrating LT with the above mentioned (but could be easily passed by) methods, we shall count on trackers and custom modified clients (like this one, qBitTorrent Enhanced Edition) to handle leeching, right?

arvidn commented 4 years ago

There are a few things left to say on the subject.

It's an open research problem for sure.

The old super-seeder algorithm is resistant to this form of leeching in small swarms

When you say "old", I assume you mean the "strict" one, right? The strict one only really works when you are the initial seeder, it doesn't work for just any random seed in a swarm.

However, the "non-strict" version essentially inverts the piece picking control. Which I think explains why it helps in this case. If a client says it has nothing, a super seeder will assume it's interested in all pieces, and expose one at random (picked to not overlap with one exposed to another peer). If the peer doesn't tell about all the pieces it has, it's unlikely to be interested in the piece exposed to it, and just stall.

I think people should try this and report back their experience. It's important to not enable strict super seeding though. That's a libtorrent setting.

There are costs to running as a super seeder as well. For instance, you commit the crime you accuse the cheaters of; lying about which pieces you have. Since you will look like a downloader, other super seeds will stay connected to you even though neither of you are interested in anything from each other. So, a super seed will look like a "cheater" in this regard.

Another thing people may want to try is to disable the "allowed_fast" feature, or set it to 0.

Presumably the peer download information is retained across re-connects, otherwise it would be easy to game the choker.

The amount of payload uploaded to and downloaded from a peer is retained between connects, yes. To save space, it's rounded to even kiB, but that shouldn't be a problem.

It would be interesting to re-connect to suspect peers to see if they have the pieces they downloaded from you. Probably not practical unless the leeching is a major problem

It's also hard to know what conclusions to draw. Maybe the "suspect peer" is just a super seeder that's trying to avoid being taken advantage of.

If a choice must be made between punishing bad peers and saturating bandwidth, punishment should always be the priority

This one doesn't make sense to me. If performance is strictly less important than punishing free-riders, there is never a reason to stay in a swarm once you're done downloading. There's always a possibility that you provide bandwidth to free-riders once you're a seed. There's no way of knowing, the only safe bet to not provide bandwidth to free-riders, is to leave the swarm.

ghbplayer commented 4 years ago

I agree with most of what you said Arvid, except the last part. The goal is to always punish known cheaters while being a good cooperator to others. While uploading to peers is a good deed, uploading to leechers / cheaters is a bad deed. If you reward cheaters you get more cheaters, hurting everyone.

This is a balancing act, and tricky. But if you know a peer is a cheater, you must never cooperate with them. Even if it means your upload bandwidth goes unused.

arvidn commented 4 years ago

The goal is to always punish known cheaters

There is no way to know whether someone is a cheater (with the existing protocol), this is the problem.

This is a balancing act, and tricky.

Right, which is different from:

If a choice must be made between punishing bad peers and saturating bandwidth, punishment should always be the priority

I was explaining what it means to punishing cheater (and potential cheaters) at all cost. If you meant punishing cheaters that you know for sure are cheaters, I apologize, that's been done from day 1.

ghbplayer commented 4 years ago

Thank you for clarifying.

xavier2k6 commented 4 years ago

maybe setting Upload slots behavior to Fixed slots & Upload choking algorithm to Anti-leech is a better fit for those wanting to restrict those leechers & manually banning them too...

FranciscoPombal commented 4 years ago

here's another reference: https://www.usenix.org/legacy/event/nsdi08/tech/full_papers/piatek/piatek_html/index.html

@arvidn What are the main challenges in turning this into a BEP and possibly implement it in libtorrent? I realize it would be a bit involved, as it requires bit of public-key crypto, but there is precedent for that: http://bittorrent.org/beps/bep_0035.html

arvidn commented 4 years ago

What are the main challenges in turning this into a BEP and possibly implement it in libtorrent?

Someone who has time and expertise I think :)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Seeker2 commented 4 years ago

I have another option, part of it is having a max connection limit for seeding torrents less than downloading torrents: Alternate connections per torrent while seeding. https://github.com/qbittorrent/qBittorrent/issues/2193

Normally when a peer becomes a seed, all seeds are immediately disconnected -- freeing up "connection slots" for new peers to connect to this seed. But with a much lower max connection limit for the seed than while downloading, the seed may briefly exceed its max connection limit even after disconnecting all seeds. This can be a feature instead of a bug...the seed would only upload to currently connected peers until some disconnect.

During that time, use something like Tit-For-Tat instead of using fastest upload or anti-leech algorithm -- upload mostly to the peers that previously uploaded to your peer until they become seeds as well.

I saw this partially happening in one of the upload slot tests I did with qBitTorrent years ago -- no working optimistic unchoke and upload slots didn't cycle when max upload slots is manually set below current upload slots in use. qBT kept uploading to the same peers till they lost connection.

arvidn / libtorrent

Improve Anti-leeching Algorithm #4217