Closed Nutomic closed 1 year ago
Is aggregated votes sending secure? Is it possible that some bad actor instances would send large fake counts?
If sending out the requests is the problem for lemmy.lm, you could send a single request (instead of 100) to a separate worker, on a separate server, that then forwards it to the 100 instances. You could create a load balancer that accepts multiple workers and decides where to send the single request, per request.
would it be possible to reverse the process? i.e. instead of sending post requests when new votes arrive, to aggregate them and make them available as a subscription on the server, so that they can be retrieved via GET requests from the respective instance?
I am not particularly knowledgeable, but could Shared Inbox (part of the ActivityPub protocol) be of help?
It sounds like a lot of requests, but maybe it isn't a big problem in terms of CPU or network. Rather than aggregating requests, perhaps we just need to make sure our federation job queue is efficient, and if it isn't, possibly use a different one.
IMO, aggregation is the best route to take here. This is not actually a new problem, and aggregation is how it's done when disseminating updates between routers runnig BGP on the Internet. (I'm a network architect, so that's how I think of things.) OSPF, another routing protocol, also uses aggregation. When you have hundreds of thousands of updates, sending each one just isn't efficient. Things are going to tip over at some point if you do it that way. It may be more work to refactor things, but anything else I think is--at best--just buying you time.
Are votes being updated immediately across all federated instances? If so, is live updating of vote counts even necessary? Aggregating votes every hour would help, but an hour lag seems like a lot.
Why not just retrieve votes from the hosting instance upon the thread being accessed by each user? Surely updating votes on page view will involve fewer requests than sending out a wave of updates on every vote.
Are votes being updated immediately across all federated instances? If so, is live updating of vote counts even necessary? Aggregating votes every hour would help, but an hour lag seems like a lot.
Why not just retrieve votes from the hosting instance upon the thread being accessed by each user? Surely updating votes on page view will involve fewer requests than sending out a wave of updates on every vote.
I guess votes are needed to sort posts. Even 1 bundle per minute would drastically reduce the number of requests, while mantaining a quasi-live update of the content.
The last option seems to be preferable, but is not easy to implement. Afaik there is no prior example of sending aggregate data over Activitypub, so it would require an extension which would be incompatible with other platforms. It might also be necessary to rewrite the way post ranking is calculated. On the other hand this could be an improvement for privacy, as other instances dont see which particular user upvoted or downvoted a post.
To suggest a fourth option: I don't think aggregation is necessary as much as batching is, you could still send each vote individually in a single request but handle them when there is less server load. That way fake votes is less of an issue.
I'm not entirely sure how ActivityPub works but I assume it is legal to respond with a 429
or a 503
and a Retry-After
header?
That way the source server could send updates immediately, if the target is overloaded it sends a Retry-After
header and the source server batches all updates for that target server together until the time expires.
Could also add prioritization for important events like posts and comments and push votes to a later date when load should be lower. I think having the votes arrive reliably is more important than having them update live. Reddit also does not update votes live.
I just briefly glanced over the ActivityPub spec but there seems to be a collection type for likes, is it possible to use this at least for the votes? https://www.w3.org/TR/activitypub/#likes
It seems like this is not really a problem like I thought because these send jobs are very lightweight. Probably the solution is to remove the worker count setting so that unlimited workers can be created on demand.
@Nutomic Is there a different issue to follow that's currently hindering federation then? Federating instances are missing comments and votes.
For example, here's a random post on !technology@lemmy.ml that's 4 hours old: https://lemmy.ml/post/1250165
Here's how it looks on different instances:
Instance | Comments | Votes |
---|---|---|
lemmy.ml | 13 | 95 |
beehaw.org | 6 | 22 |
lemmy.world | 11 | 64 |
sh.itjust.works | 10 | 45 |
For comparison, here's a random post from !technology@beehaw.org that's also 4 hours old: https://beehaw.org/post/548636
Instance | Comments | Votes |
---|---|---|
beehaw.org | 70 | 59 |
lemmy.ml | 63 | 20 |
lemmy.world | 66 | 160 |
sh.itjust.works | 62 | 127 |
While the votes being different is not such a huge deal, missing comments is absolutely a huge issue.
Does Lemmy utilize some kind of swarm sharing? I'm thinking that scalability could increase a lot if when you ask one instance for an update and it happens to have updates from other instances that you don't have then it could send those as well. If they are aggregated you could check the timestamps to determine if you have collected the entire timeline.
You would need to send more data in the initial request to let the instance know what data you need but the total amount of requests could be greatly reduced and the load of sharing all the data could be distributed across all instances.
@DomiStyle Im not sure whats the reason, its certainly worth investating. A possibility would be instance blocks or user bans. Could also be networking problems, or a software bug. There is also this issue which means that activities will get lost during restart.
@Kryptortio Federation uses POST requests, except for explicit user requests to fetch a remote object (eg searching a community url). So there is no automatic "asking other servers". It might be possible to implement something like that, but it would require major changes to federation logic. I suggest you read the Lemmy federation docs and Activitypub standard to get a better understanding how it all works.
Closing in favor of https://github.com/LemmyNet/lemmy/issues/3121
Yesterday I posted an announcement telling admins of large instances that they need to increase the "federation worker count". These workers are needed to send outgoing federated actions. Since then I did the same adjustment on lemmy.ml, and had to increase the worker count up to 360.000. Luckily this isnt causing any problems yet, but it points to a scaling limitation which will likely become important in the future.
To understand this limitation its important how federation in Lemmy communities works. Lets say a user from sopuli upvotes a comment in
!memes@lemmy.ml
. This upvote action is sent via Activitypub to the lemmy.ml server, which forwards it to all instances where at least one user follows the memes community. The same happens for all other actions like creating or editing posts/comments, mod actions, and so on. The problem is that there are lots of these actions (particularly votes), and they need to be forwarded to lots of different servers. For example the recent top posts in /c/memes have around 1500 upvotes. Lets assume that users from 100 different instances follow the community. Then federating the votes for this single post requires 1500 * 100 = 150.000 HTTP POST requests to other servers. On top of this are requests to federate comments and comment votes which likely reach a similar magnitude.Here are some possible workarounds and solutions:
The last option seems to be preferable, but is not easy to implement. Afaik there is no prior example of sending aggregate data over Activitypub, so it would require an extension which would be incompatible with other platforms. It might also be necessary to rewrite the way post ranking is calculated. On the other hand this could be an improvement for privacy, as other instances dont see which particular user upvoted or downvoted a post.