atomheartother / QTweet

A qt Discord bot who cross-posts from Twitter to Discord
GNU Affero General Public License v3.0
97 stars 71 forks source link

Find a workaround to the 5k users limit #32

Open atomheartother opened 4 years ago

atomheartother commented 4 years ago

This has been stressing me out for weeks and I've been considering options, but I can't find any way to fix it. In short, using Twitter Streams, I am limited at 5 000 user subscriptions per application - as I write this QTweet is at 4400. Here's the options:

For now I will add an error message when someone tries to subscribe to new people when we've reached 5000 users. Yup, we're there soon.

atomheartother commented 4 years ago

Thoughts:

I feel like a vast majority of subscriptions on qtweet are for content, media, that kind of stuff, where it doesn't really matter if it's posted with a delay. So I could just forget about real time streaming and move to fetching the user's latest tweets every few hours, and do this for every user. This frees up my 5000 users limit, and instead I'm now limited to 100 000 requests per day - with my current 5000 users, that's basically 1 request per hour, though as I get more users this will get slower.

However some people still want some fast tweets, and so I'll allow people to use a flag called --realtime, off by default. That flag will put you on the old (current) realtime system. HOWEVER, you're limited to say 1 or 2 realtime streams per server. If you want more, you have to pay (not sure how much yet). Now what does this do:

I can use an exponential backup algorithm & not too dumb delays to minimize requests and to minimize delay between a tweet being posted and qtweet posting it to discord...

Food for thought

trodiz commented 4 years ago

Do forgive me if this is a dumb suggestion (it probably is), but would it be possible to cycle through subscriptions to work around this limitation. What I mean by that is as soon as you hit the upper limit (5000?) unsubscribe from one user and subscribe to the new one. Fetch their latest tweets and then move on to the next user. Keep cycling through all the users you have to serve to... like a round-robin scheduler.

Does that make any sense?

ps: I just stumbled upon this project. I am trying to host this service myself like you recommended, so I applied for a twitter dev account and is currently awaiting approval.

franchesf commented 4 years ago

How much does it cost to move to another solution and have real-time posting + infinite users?

atomheartother commented 4 years ago

Another solution... Than twitter?

franchesf commented 4 years ago

Another solution... Than twitter?

No no! Solution for the 5k limit.

atomheartother commented 4 years ago

No no! Solution for the 5k limit.

There are no other solutions except for the Enterprise API. The Enterprise API is far, far from our budget, Twitter won't even respond to my queries about it but either way it's a thing you negotiate on a project to project basis, not a program you can just join.

franchesf commented 4 years ago

No no! Solution for the 5k limit.

There are no other solutions except for the Enterprise API. The Enterprise API is far, far from our budget, Twitter won't even respond to my queries about it but either way it's a thing you negotiate on a project to project basis, not a program you can just join.

Got it! That's a shame - i would've loved to become your patreon if it was more affordable. I was looking for something like this for years now as IFTTT is not that reliable.

Well good luck!

atomheartother commented 4 years ago

Do forgive me if this is a dumb suggestion (it probably is), but would it be possible to cycle through subscriptions to work around this limitation. What I mean by that is as soon as you hit the upper limit (5000?) unsubscribe from one user and subscribe to the new one. Fetch their latest tweets and then move on to the next user. Keep cycling through all the users you have to serve to... like a round-robin scheduler.

I forgot to respond to this. This isn't such a bad idea (I actually hadn't thought of it!) but it would be a mess for multiple reasons, first of all Twitter limits how often you can re-register a stream, so I couldn't rotate it too often. Second of all, while this is happening, tweets would be lost, I mean that while I am not subscribd to tweets from an account, if that account tweets, I lose the tweet forever. This is a pain and not acceptable for the end users. Finally this is still a temporary solution as eventually we'd hit a limit where I'd have like an hour between every window of realtime subscription, and again i'd be stuck no matter how much money or effort I could throw at twitter at this point.

trodiz commented 4 years ago

while I am not subscribd to tweets from an account, if that account tweets, I lose the tweet forever.

Is it a requirement that the bot must be subscribed to a user all the time to get their latest tweets? You probably already know about this, but according to this documentation you can request latest-tweets with this since_id paramter. So even if you miss a set of live tweets, you can fetch them later when you come back to that user. There would be a delay ofcourse, but I don't see how the bot would lose a tweet forever.

I know I'm overlooking something here... 🤔

atomheartother commented 4 years ago

@trodiz Yes but I'm limited to 900 calls to that endpoint per 15min window. With 5k users that is simply not manageable :(

atomheartother commented 4 years ago

Huh, I had however never seen the lists API, is this new? It's limited to 5 000 users per list, I wonder if I could use this, I'll read into it.

atomheartother commented 4 years ago

Ok, first day of experimentation with lists. Here's my results:

Lists are overall a GREAT use case for this problem. I can have 5000 users per list and 1000 lists per account, so that totals 5M subscriptions. I can check a list every 1s, so overall it's pretty damn fast. So I made a small branch to test things out.

The problem is state management. So far QTweet is perfectly stateless, you can deploy her anywhere and all the info required is stored on her side. Here, lists are stored by Twitter and that causes a lot of problems, not the least of which is that if I clear out my entire database locally, QTweet still has those lists registered on Twitter and the endpoints to manage lists seem pretty rate-limited.

I am still experimenting however despite the state manegement problem lists seem to be a pretty good option.

Globlonux01 commented 4 years ago

I choose this option : Put a hard limit on the number of subscriptions per server and only allow any higher behind a paywall.

Limit per server : 10 max.

Furry commented 4 years ago

@Globlonux01 mentioned a 10 server max, but even then that seems like far too much. I do have a few ideas though...

  1. An approach that may work, though would require it's own bit of recoding, would be to allow users to submit their own bearer token to the bot, then specify if they'd like that token to be used only for private or public use. It may have to spawn a new docker/process per token though, which is a major downside unless you can create a generator for each token/server.

  2. Multiple apps/accounts that you own, whos tokens feed into the bot, so if it caps out one token, it can move on to the next.

  3. You can have two separate categories. One for streaming posts, and one for polling. Only one streaming account would be allowed per server, and everything else would be on an hourly/daily poll. This seems like the best option, but would require you to migrate existing servers to polling, which might be a chore.

But since i'm using this bot now, if you need help with any issues/want help working on this particular thing, just @ me in an issue and i'll help out.

atomheartother commented 4 years ago

An approach that may work, though would require it's own bit of recoding, would be to allow users to submit their own bearer token to the bot, then specify if they'd like that token to be used only for private or public use. It may have to spawn a new docker/process per token though, which is a major downside unless you can create a generator for each token/server.

I actually have been thinking of something somewhat similar to this, using docker swarm and running other instances of the bot as slaves to the master node, using separate tokens provided by users for each one.

That would indeed require a pretty deep rewrite and a bunch of code dedicated to managing the different nodes, but it could be done. The point being i'd need to separate QTweet into 2 programs, one that's a front-posting Discord bot and the other that's a swarm of bots that receive their orders from the other one, and all subscribe to different twitter users using user-provided tokens.

While that SOUNDS appealing, i'm not sure of how much this complies with the twitter TOS, also spawning new instances of a service on the fly is definitely not something i'm super familiar with and this is getting into big boy devops territory. But also it would be pretty cool. I could even go for a microservices approach... Anyway I'm keeping it in mind but right now my bets are on the lists API thing.

atomheartother commented 4 years ago

ALSO keep in mind twitter is gonna EOLthe Streaming API I use sometime soon so i'm definitely not gonna rely on that if I'm doing a deep rewrite of my bot haha

Furry commented 4 years ago

Alright! Lists seems like the easiest and most twitter-friendly approach anyway :)

atomheartother commented 4 years ago

New approach.

I'll be implementing the !!list command (which allows you to follow a list) completely separately from !!start, this'll give me some time to debug list-related issues and also unblock the 5k limit for the time being.

SteadEXE commented 3 years ago

Hello,

I just joined the thread, do you think it's possible to ask bot users to create their own API key, so the bot manages the tweets subscription with the server's API key? Or maybe, put all the user API keys in a pool, so unused queries can be used by others members.

it's just an idea, maybe a bad idea, but who knows...

ebergstedt commented 3 years ago

Cycling API keys is probably a trap, I'm sure Twitter would correlate api key to IP origination and detect cycling. Twitter really dislikes bots abusing or going around their limitations. Each api key would need to have a dedicated IP origination per request, which means you'd need to set up an instance (in a cluster of machines) for every 5k multiplier, which is $5 on digitalocean. You'd need need use a gateway app to connect to your api instances.

I'm very happy with your instructions for self-host and docker, and I'm sure many are as well. The above project would require fundamental infrastructure changes which would probably break the ease of use the self-hosted design you've made now, so it'd have to be a separate project.

GoosePlays20 commented 3 years ago

@atomheartother send me your discord so i can dm you, i can get in touch with bot devs that can help