[REQUEST] User iteration in automation

zeldatp151 commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I can't automate downloading from Twitter because of a combination of the API limits and the fact that the automation function just repeats the same few users it's already downloaded.

Describe the solution you'd like A clear and concise description of what you want to happen.

I think a solution to this would be to change how automation handles which users it grabs. Right now it's either all users in the list (set to 0) or it's a number of users from the beginning of end of the list ( + or - ) and it repeats this every run. Instead it could be made to iterate through all of the users, for example, I would set it to 3 users and it would download the first 3 users on that run, then on the next run it would download the next 3 users in the list, and on the next run the next 3 users, and so on until it reaches the end of the list, and then it starts over again. This would allow large lists of users to be downloaded a little bit at a time over a longer period, with a gap between each run in order to not trigger API limits.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Alternatively, be able to set a delay between each user in the list for automation, for ex automation would download the first user, then wait (a configurable amount of time) 5 minutes and then download the second user, then wait again and download the third, this could also effectively bypass some API limits.

Additional context Add any other context or screenshots about the feature request here.

AAndyProgram commented 1 year ago

This would allow large lists of users to be downloaded a little bit at a time over a longer period, with a gap between each run in order to not trigger API limits.

I thought about it. But we don't know when to expect the limits to be reset.

The downloader doesn't know who sends users there to download: the user (manually) or scheduler. The user class's internal functions also don't considering who downloads the user. I was thinking of adding a property (e.g. FromScheduler) to the class, but for abstraction purposes this property will be changed for all selected classes (users). The site may not be available to download at the moment or may be disabled by the user. So this property will remain with a wrong value. In this case I need to add a reset function to many functions. So I added a property that allows you to set the number of users from the start of the selection or from the end. But in really it didn't give me what I expected.

On the other hand, in order to implement wait timers as you asked, I need to change the scheduler. In this case, one task will not be stopped and will conflict with other plans, if they exist.

I recommend grouping users and adding different plans. Add the label Group 1 for the first 60 (for example), Group 2 for the next 60, etc. Then create new scheduler plans (specified with labels Group 1, Group 2, etc) with a different delay value so that when you run SCrawler, each next plan (of Twitter users) is delayed by some time compared to the first one (of Twitter users). If you have any plans that include non-Twitter users, I recommend adding an excluded site (Twitter) there to prevent Twitter users from downloading (in those plans).

zeldatp151 commented 1 year ago

I understand, I was just hoping to avoid creating a bunch of groups and automations for Twitter, on average the API limit triggers after 2-4 users, so I'll need to create a new group probably for every 3 users.

AAndyProgram commented 1 year ago

I think 3 users is too few. Maybe you should try more?

zeldatp151 commented 1 year ago

It's dependent on how many new files there are to download for each user, but even if every user has 0 new files, I still hit a limit after 10 users. If each user has a few (10-20) new files to download, I hit a limit after on average 4 users, and if a user has a few hundred then I'll hit the limit after just 1 user.

AAndyProgram commented 1 year ago

Actually, it doesn't depend on the files, but on how many requests were made to the site and how many posts the site returned with a response. For one request the site returns 20-25 posts. So to get the "limit" it's about 20-24 requests (600/25).

AAndyProgram / SCrawler

[REQUEST] User iteration in automation #168