dart-lang / pub-dev

The pub.dev website
https://pub.dev
BSD 3-Clause "New" or "Revised" License
794 stars 146 forks source link

[proposal] adopt measures against bots #6226

Closed iapicca closed 1 year ago

iapicca commented 1 year ago

Use case

Whether this was the intended goal or not, in my experience the "like" counts is often taken in account in the process of consider what package to adopt.

Obtaining "likes" by unfair means (possibly bots) could sway a larger portion of audience towards a given package regardless its merits.

Proposal

It would be useful to adopt measures to prevent or discriminate unfairly obtained likes

example [getX](https://pub.dev/packages/get) looks suspicious to me Screenshot 2022-11-27 at 11 22 26
isoos commented 1 year ago

Obtaining "likes" by unfair means (possibly bots) could sway a larger portion of audience towards a given package regardless its merits.

I'm curious: do you see any sign of bot likes?

stricter sign up procedure (captcha...)

We only have Google accounts. Signing up for them is a considerable effort already, and I think Google accounts have much better bot detection and protection than we can implement as a small team.

hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age), but in practice I think it is always the X vs 2X vs 10X differences that really count, and in those a few "rogue" like is unlikely to matter.

iapicca commented 1 year ago

hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age), [...]

I think this would be a good start. What about this one?

  • show the names of the accounts who like the package

#

[...] but in practice I think it is always the X vs 2X vs 10X differences that really count, and in those a few "rogue" like is unlikely to matter.

I think we are getting already above the 5X see below

I'm curious: do you see any sign of bot likes?

example [getX](https://pub.dev/packages/get) looks suspicious to me Screenshot 2022-11-27 at 11 22 26

Do you really believe that getX has 5 times the like of bloc or firebase and 30% more than provider ? Doesn't this look suspicious at all?

isoos commented 1 year ago

Do you really believe that getX has 5 times the like of bloc or firebase and 30% more than provider?

Some sampling of the mentioned packages and other popular ones (order by likes):

Order by forks: firebase_*, bloc, dio + get, provider, http, flutter_native_splash. Order by stars: dio, bloc, firebase_*, get, provider, http, flutter_native_splash.

It seems to me that package:get could be very well used (and liked and forked and starred) that much. If there is a bot activity, it is not self-evident based on these numbers (or they also managed to replicate it on GitHub).

(Note: it may not be the same users who like/fork/star the package, and they may do it in a different part of their use/understanding of the ecosystem. E.g. they may have learned Dart/Flutter from a tutorial that used get, and they hit the like button right away. There is no malice in that.)

iapicca commented 1 year ago

I feel that we are discussing whether my example is good or not, rather than addressing my feature request.

I'm not campaigning on a specific package being or not "boosted" by fake likes (there is already twitter for that), I agree that maybe the like count of the package I picked as example is totally legitimate I don't want to talk about that package and focus instead on measures against bots in a generic sense.

I think that the first point should be a default for transparency anyway

  • show the names of the accounts who like the package

I understand that is inconvenient to change the signup procedure... fair enough

  • stricter sign up procedure (captcha...)

We only have Google accounts. Signing up for them is a considerable effort already, and I think Google accounts have much better bot detection and protection than we can implement as a small team.

I think I probably didn't express myself correctly

  • hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age)

I have 2 points to make here

reframing the 3rd point of my proposal: flag profiles that like only packages of a single author

isoos commented 1 year ago

show the names of the accounts who like the package I understand that is inconvenient to change the signup procedure... fair enough

It is not only that, but also a privacy question: we would need an opt-in approval from the users, and the feature wouldn't really provide them much value. If you are worried about bots, they could just not opt-in and you wouldn't be any wiser...

a like should based on real world experience, not on "sympathy" for a user or organization, it's hard to believe that any single positive experience comes from packages and plugins of the same organization or author

I wouldn't try to second-guess the intent behind a like: we only see a click event, and don't know whether it was a quick moment decision or a long-term elaborate one. In an ideal world it would matter, but in practice we have no control or insight into it.

flag profiles that like only packages of a single author

If we implemented it, and anybody cared to setup bot voting, they would realize what's happening, and soon enough look at this thread or figure out this countermeasure from the open source code. They would start to modify the bot to like some other packages too, maybe even randomize a bit in time and activity. We would be in an even worse position.

Another angle: why would we discount likes from users that may be infrequent visitors to the site, and only liked packages from the same publisher? They may have clicked at a time when the publisher was different, but there was some consolidation in development efforts. We need better patterns if we want to take negative measures on this one.

It is not clear to me that there is any ongoing malice with the likes, and until then, I think our limited efforts are better spent on other features and improving the site. In contrast: when there is a spam package being uploaded, we do take steps to remove it (and also prevent further uploads from the same account). But the case must be clear, not just a vague hunch.

jonasfj commented 1 year ago

I feel that we are discussing whether my example is good or not, rather than addressing my feature request.

Yeah, let's avoid discussion of individual cases.

It's not my impression that there is widespread use of bots for likes on pub.dev; nor that this would be an urgent problem. If it does become an issue, I think we might want to focus the effort on minimizing bot accounts in general.

And I think we should be careful here. It's very hard to see if a package is useful. And being "useful" is very subjective. Some authors are really good at outreach, tutorials, getting people started. And that might be "useful" to some people.


We do undertake some effort to minimize bots, in particular we block accounts uploading spam. There are other efforts we can undertake, but that will probably not be subject for public discussion.

None of the solutions to mitigate bots are perfect and they all have downsides. Even the best spam filter occasionally throws away legitimate emails. Hence, employing more measures against suspected bots must be weighted against the negative implications of doing so. So if possible, I'd much rather avoid aggressively employing imperfect bot mitigation systems.

I think we should close this for now. We're not planning any action at the moment. And if we need to employ mitigation systems I don't think we can debate them publicly.

iapicca commented 1 year ago

I think we should close this for now. We're not planning any action at the moment. And if we need to employ mitigation systems I don't think we can debate them publicly.

I understand that, thank you both @jonasfj and @isoos for addressing the issue

iapicca commented 1 year ago

@jonasfj I think this could mitigate the issue

rydmike commented 8 months ago

This is good, as the comments further above should conclude the debate about artificial "Likes" boosting or their tampering on pub. The statements above basically say there has not been any detection of such tampering.

This is good new information, since I have been hearing about suspected Likes tampering on pub, for at least 4 years in the Flutter community. I always said, if that is the case evidence should be presented, never saw any. Plus now the statements in this issue make it clear that such tampering has not been detected, so it should also then finally resolve that suspicion and debate.

Thanks this is excellent news 👍

jonasfj commented 8 months ago

@rydmike I don't think we know for certain that "artificial "Likes" boosting" isn't taking place :rofl: But I don't have an impression that it's widespread, or that it affects many packages. I haven't seen any evidence, but I'm also not sure what such evidence would even look like.

Regardless, it would take a non-trivial amount of work to orchestrate many Google bot accounts. I personally think most package authors would get further focusing on writing a good package, with solid documentation, tutorials, videos and such.

iapicca commented 1 month ago

[...] I think this could mitigate the issue

@rydmike @jonasfj I feel that this PR

would indirectly help "sniffing" packages boosted by bots cc @szakarias

iapicca commented 3 days ago

I cross-referenced the like and dl count with the experimental flag the numbers of getx seem fishy, don't they?

@jonasfj @isoos could you please consider re-opening this issue? (thank you @szakarias for making this possible)

package likes downloads
bloc 2.9k 2.5M
riverpod 3.4k 2M
provider 10.3k 3.75M
getx 14.8k 604k
BLOC screenshot Screenshot 2024-11-19 at 18 16 52
RIVERPOD screenshot Screenshot 2024-11-19 at 18 16 42
PROVIDER screenshot Screenshot 2024-11-19 at 18 16 27
GETX screenshot Screenshot 2024-11-19 at 18 15 55

cc @jonataslaw

Tienisto commented 3 days ago

I don't use getx and likely never will, but it's difficult to tell if the likes from the getx package are fake given it's natural growth. I do think that people are more likely to just "like" the package because they are hyped because of the flashy readme. It's interesting to discuss when and why people like a package since this is an emotional process. Showing the download numbers as seen in the new experimental feature is a good step to provide a more technical tool to rate a package that is free from emotion.

Edit: Another argument against bots in getx is that it is becoming obsolete because the last stable release was 14 months ago. Likely, a lot of developers migrated away from getx, leaving the like count as something historical.

See https://pubstats.dev/packages/get,flutter_riverpod

Bildschirmfoto 2024-11-19 um 18 07 06

iapicca commented 3 days ago

I don't use getx and likely never will, but it's difficult to tell if the likes from the getx are fake given it's natural growth. I do think that people are more likely to just "like" the package because they are hyped because of the flashy readme. It's interesting to discuss when and why people like a package since this is an emotional process. Showing the download numbers as seen in the new experimental feature is a good step to provide a more technical tool to rate a package that is free from emotion.

See https://pubstats.dev/packages/get,flutter_riverpod

@Tienisto I think many devs (including in teams I worked with) used to pick a package over another "also" according to likes since a "widely adopted" package has in theory more chances to succeed and being maintained longer (I know it's not always the case, RIP hive)

I think it's not just "hype" and "flashy readme" but trying to get people (and companies) to invest in a project, the huge likes/DL discrepancy could be caused by artificially boost the likes

I just wish this phenomenon to be investigated... that's all I ask

Tienisto commented 3 days ago

I also put the like count into consideration. Especially, when the popularity metric is kind of abstract (99% vs 100%) Maybe the like count isn't good in the first place. crates.io, npm, and nuget do not have this metric at all.

iapicca commented 3 days ago

Removing "like" feature sounds like a good idea to me

isoos commented 3 days ago

I'm curious: what do you think of GitHub stars? Because likes is essentially a very similar feature, even the first line of the GitHub documentation says "Starring makes it easy to find a repository or topic again later." and likes here do the same.

I'm not convinced by the data shown here that this is a clearly bot or fraudulent activity. If anything, the referenced data seems to suggest that there is no clear correlation (or rather regression function) between the download and like counts, and with that, we should treat them with separate usefulness (likes being a historical accumulation of goodwill towards the package).

If you have seen video bloggers saying (or begging) "Like and subscribe" you should know that they broadcast it because it works. If a package has an outreach like that, it may just get more likes here.

iapicca commented 3 days ago

@isoos if that's the intended use of "like" than I'd rather have it removed as mentioned above in real life, real people and real companies used to refer to the like count if it is intended to represent a "social feature" rather than a quality indicator then I don't see much the value of it

I'm not convinced by the data shown here that this is a clearly bot or fraudulent activity.

that makes one of us

[...] we should treat them with separate usefulness

I completely agree, what about make it clear to the package adopters?

jonataslaw commented 1 day ago

While I appreciate the points raised, I believe it's important to emphasize that no single metric can provide a comprehensive view of a package's popularity or quality. Metrics like likes, downloads, GitHub stars, and the number of open-source projects using a package all offer valuable insights when considered together.

That said, the comment by @iapicca seems to reflect a strong personal preference rather than an objective assessment of the available data. For instance, while download/likes counts have their limitations, they still hold relevance when combined with other indicators, such as:

Take, for example, a comparison between Riverpod and GetX:

This data demonstrates that analyzing multiple metrics can provide a more nuanced and balanced view, rather than relying on a single parameter.

For these reasons, I encourage focusing on a holistic evaluation rather than dismissing certain metrics outright. I hope this perspective fosters a more constructive discussion moving forward.

bigbott commented 18 hours ago

GetX has fewer downloads because it is less popular among enterprises, and the number of downloads is not affected by automated builds.

There is another metric available on GitHub -- the number of repositories that use a particular repository. By this number, GetX is comparable with BLoC and Riverpod.

People use GetX because it is simple and has a lot of shortcuts.

Software development is the art of balancing KISS and SOLID, and for some people (including myself), GetX just gets it right.

If you want your app more SOLID, it can be done with or without GetX, but, please, stop being so emotional about software frameworks that you personally don't use and don't know.

yang-lile commented 17 hours ago
  • Riverpod has 67,000 projects depending on it on github.Riverpod 在 github 上有67,000 个依赖它的项目
  • In contrast, GetX is used by over 198,000 open-source projects, highlighting its extensive adoption.相比之下,GetX 被超过198,000 个开源项目使用,凸显了它的广泛采用。

Where does this data come from? @jonataslaw

bigbott commented 17 hours ago

Where does this data come from?

github repository page on the right side @yang-lile

yang-lile commented 16 hours ago

So, bloc is referenced by over 200,000 open source projects, but you don't mention it? @jonataslaw . And we all know that riverpod is younger.

jonasfj commented 14 hours ago

To be clear, it's not exactly impossible to fake the download count :see_no_evil: :see_no_evil: :see_no_evil:

Certainly packages often used by app developers who have active CI systems running are going to have HUGE download count boost.


I think that in general, we should be extremely careful to derive anything from a huge download count or large number of likes.

I'm actually not sure it says much whether a package has 8M downloads or 80k downloads. All it tells us is that there is a non-trivial number of active users or a non-trivial amount of activity from a sizable set of active users.

It's sort of the same with likes.

I think the numbers only speaks volume when they are very low. That said, there are lots of quality packages with few downloads and few likes.


If we want good signals of quality I think we do have one: https://docs.flutter.dev/packages-and-plugins/favorites

Of course, it's not easy to scale flutter favorites to cover all high quality packages.