dart-lang / pub-dev

The pub.dev website
https://pub.dev
BSD 3-Clause "New" or "Revised" License
782 stars 147 forks source link

[proposal] adopt measures against bots #6226

Closed iapicca closed 1 year ago

iapicca commented 1 year ago

Use case

Whether this was the intended goal or not, in my experience the "like" counts is often taken in account in the process of consider what package to adopt.

Obtaining "likes" by unfair means (possibly bots) could sway a larger portion of audience towards a given package regardless its merits.

Proposal

It would be useful to adopt measures to prevent or discriminate unfairly obtained likes

example [getX](https://pub.dev/packages/get) looks suspicious to me Screenshot 2022-11-27 at 11 22 26
isoos commented 1 year ago

Obtaining "likes" by unfair means (possibly bots) could sway a larger portion of audience towards a given package regardless its merits.

I'm curious: do you see any sign of bot likes?

stricter sign up procedure (captcha...)

We only have Google accounts. Signing up for them is a considerable effort already, and I think Google accounts have much better bot detection and protection than we can implement as a small team.

hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age), but in practice I think it is always the X vs 2X vs 10X differences that really count, and in those a few "rogue" like is unlikely to matter.

iapicca commented 1 year ago

hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age), [...]

I think this would be a good start. What about this one?

  • show the names of the accounts who like the package

#

[...] but in practice I think it is always the X vs 2X vs 10X differences that really count, and in those a few "rogue" like is unlikely to matter.

I think we are getting already above the 5X see below

I'm curious: do you see any sign of bot likes?

example [getX](https://pub.dev/packages/get) looks suspicious to me Screenshot 2022-11-27 at 11 22 26

Do you really believe that getX has 5 times the like of bloc or firebase and 30% more than provider ? Doesn't this look suspicious at all?

isoos commented 1 year ago

Do you really believe that getX has 5 times the like of bloc or firebase and 30% more than provider?

Some sampling of the mentioned packages and other popular ones (order by likes):

Order by forks: firebase_*, bloc, dio + get, provider, http, flutter_native_splash. Order by stars: dio, bloc, firebase_*, get, provider, http, flutter_native_splash.

It seems to me that package:get could be very well used (and liked and forked and starred) that much. If there is a bot activity, it is not self-evident based on these numbers (or they also managed to replicate it on GitHub).

(Note: it may not be the same users who like/fork/star the package, and they may do it in a different part of their use/understanding of the ecosystem. E.g. they may have learned Dart/Flutter from a tutorial that used get, and they hit the like button right away. There is no malice in that.)

iapicca commented 1 year ago

I feel that we are discussing whether my example is good or not, rather than addressing my feature request.

I'm not campaigning on a specific package being or not "boosted" by fake likes (there is already twitter for that), I agree that maybe the like count of the package I picked as example is totally legitimate I don't want to talk about that package and focus instead on measures against bots in a generic sense.

I think that the first point should be a default for transparency anyway

  • show the names of the accounts who like the package

I understand that is inconvenient to change the signup procedure... fair enough

  • stricter sign up procedure (captcha...)

We only have Google accounts. Signing up for them is a considerable effort already, and I think Google accounts have much better bot detection and protection than we can implement as a small team.

I think I probably didn't express myself correctly

  • hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age)

I have 2 points to make here

reframing the 3rd point of my proposal: flag profiles that like only packages of a single author

isoos commented 1 year ago

show the names of the accounts who like the package I understand that is inconvenient to change the signup procedure... fair enough

It is not only that, but also a privacy question: we would need an opt-in approval from the users, and the feature wouldn't really provide them much value. If you are worried about bots, they could just not opt-in and you wouldn't be any wiser...

a like should based on real world experience, not on "sympathy" for a user or organization, it's hard to believe that any single positive experience comes from packages and plugins of the same organization or author

I wouldn't try to second-guess the intent behind a like: we only see a click event, and don't know whether it was a quick moment decision or a long-term elaborate one. In an ideal world it would matter, but in practice we have no control or insight into it.

flag profiles that like only packages of a single author

If we implemented it, and anybody cared to setup bot voting, they would realize what's happening, and soon enough look at this thread or figure out this countermeasure from the open source code. They would start to modify the bot to like some other packages too, maybe even randomize a bit in time and activity. We would be in an even worse position.

Another angle: why would we discount likes from users that may be infrequent visitors to the site, and only liked packages from the same publisher? They may have clicked at a time when the publisher was different, but there was some consolidation in development efforts. We need better patterns if we want to take negative measures on this one.

It is not clear to me that there is any ongoing malice with the likes, and until then, I think our limited efforts are better spent on other features and improving the site. In contrast: when there is a spam package being uploaded, we do take steps to remove it (and also prevent further uploads from the same account). But the case must be clear, not just a vague hunch.

jonasfj commented 1 year ago

I feel that we are discussing whether my example is good or not, rather than addressing my feature request.

Yeah, let's avoid discussion of individual cases.

It's not my impression that there is widespread use of bots for likes on pub.dev; nor that this would be an urgent problem. If it does become an issue, I think we might want to focus the effort on minimizing bot accounts in general.

And I think we should be careful here. It's very hard to see if a package is useful. And being "useful" is very subjective. Some authors are really good at outreach, tutorials, getting people started. And that might be "useful" to some people.


We do undertake some effort to minimize bots, in particular we block accounts uploading spam. There are other efforts we can undertake, but that will probably not be subject for public discussion.

None of the solutions to mitigate bots are perfect and they all have downsides. Even the best spam filter occasionally throws away legitimate emails. Hence, employing more measures against suspected bots must be weighted against the negative implications of doing so. So if possible, I'd much rather avoid aggressively employing imperfect bot mitigation systems.

I think we should close this for now. We're not planning any action at the moment. And if we need to employ mitigation systems I don't think we can debate them publicly.

iapicca commented 1 year ago

I think we should close this for now. We're not planning any action at the moment. And if we need to employ mitigation systems I don't think we can debate them publicly.

I understand that, thank you both @jonasfj and @isoos for addressing the issue

iapicca commented 1 year ago

@jonasfj I think this could mitigate the issue

rydmike commented 7 months ago

This is good, as the comments further above should conclude the debate about artificial "Likes" boosting or their tampering on pub. The statements above basically say there has not been any detection of such tampering.

This is good new information, since I have been hearing about suspected Likes tampering on pub, for at least 4 years in the Flutter community. I always said, if that is the case evidence should be presented, never saw any. Plus now the statements in this issue make it clear that such tampering has not been detected, so it should also then finally resolve that suspicion and debate.

Thanks this is excellent news 👍

jonasfj commented 7 months ago

@rydmike I don't think we know for certain that "artificial "Likes" boosting" isn't taking place :rofl: But I don't have an impression that it's widespread, or that it affects many packages. I haven't seen any evidence, but I'm also not sure what such evidence would even look like.

Regardless, it would take a non-trivial amount of work to orchestrate many Google bot accounts. I personally think most package authors would get further focusing on writing a good package, with solid documentation, tutorials, videos and such.

iapicca commented 1 week ago

[...] I think this could mitigate the issue

@rydmike @jonasfj I feel that this PR

would indirectly help "sniffing" packages boosted by bots cc @szakarias