Would it be possible to set a Sleep to allow for API limits?

coleifer / huey

a little task queue for python

https://huey.readthedocs.io/

MIT License

5.22k stars 370 forks source link

Would it be possible to set a Sleep to allow for API limits? #617

Closed igalci closed 3 years ago

igalci commented 3 years ago

I was thinking of using Huey to schedule API calls, but I would need to have my tasks sleep a certain time for every different API I am accessing.

I think a solution would be to limit # of workers to 1, and then have the tasks sleep after they are finished. However, this will not account for different APIs which can be accessed simultaneously without a problem.

I thought maybe Huey has a more sophisticated approach to this case?

coleifer commented 3 years ago

I don't really understand the use-case or what you are trying to accomplish, and it's not clear what you wish huey to do differently.

Huey allows you to run periodic tasks at any cron-like interval, so you should be able to configure them in such a way that they do not violate your API's rate limits.

Huey also allows you to dynamically schedule individual tasks to run at any given time or after a given delay.

coleifer commented 3 years ago

If you would like to provide more concrete details I can try to advise.

igalci commented 3 years ago

I am trying to create a queue of tasks for each API, with the given sleep time between tasks for each queue. Then execute the tasks one by one. Without scheduling a specific time of execution. Is that possible?

coleifer commented 3 years ago

Are you committed to this design for some reason? It seems a bit unconventional. I'm not clear on how your producers are organized or what kind of task-flow you expect. It sounds, if I had to guess, that you're indiscriminately scraping -- but maybe I'm mistaken?

igalci commented 3 years ago

Yes exactly. I am using YFinance and similar APIs to scrape the web. I need to get data for each of ~2000 stock market tickers every week or so. So if I run my job every 10 minutes, I should be good. I don't want to schedule everyone of the 2000 jobs individually with a specific time at which it will run. It would be easier to simply have a queue with a sleep in between.

Additionally, I am scraping for the 2000 tickers using Investpy which could run alongside my first job.

So I was thinking if Huey would be appropriate for this. Could I set up 2 separate queues using Huey and assign 1 worker to each so the sleep times in between are respected?

coleifer commented 3 years ago

It shouldn't be necessary to ever scrape financial data unless you're doing sentiment analysis or something, as all that data is typically available for free or quite cheap, but that's besides the point I guess.

I'd suggest you do something like this:

create a task that accepts a scraper implementation or ticker or whatever
run the task at script startup time
at the end of the task function, before returning, schedule the task to run again in X minutes


@huey.task()
def scrape_ticker(t):
    try:    
        download_the_data()
    finally:
        # Run this same task again in 10 minutes.
        scrape_ticker.schedule(args=(t,), delay=600)