Closed igalci closed 3 years ago
I don't really understand the use-case or what you are trying to accomplish, and it's not clear what you wish huey to do differently.
Huey allows you to run periodic tasks at any cron-like interval, so you should be able to configure them in such a way that they do not violate your API's rate limits.
Huey also allows you to dynamically schedule individual tasks to run at any given time or after a given delay.
If you would like to provide more concrete details I can try to advise.
I am trying to create a queue of tasks for each API, with the given sleep time between tasks for each queue. Then execute the tasks one by one. Without scheduling a specific time of execution. Is that possible?
Are you committed to this design for some reason? It seems a bit unconventional. I'm not clear on how your producers are organized or what kind of task-flow you expect. It sounds, if I had to guess, that you're indiscriminately scraping -- but maybe I'm mistaken?
Yes exactly. I am using YFinance and similar APIs to scrape the web. I need to get data for each of ~2000 stock market tickers every week or so. So if I run my job every 10 minutes, I should be good. I don't want to schedule everyone of the 2000 jobs individually with a specific time at which it will run. It would be easier to simply have a queue with a sleep in between.
Additionally, I am scraping for the 2000 tickers using Investpy which could run alongside my first job.
So I was thinking if Huey would be appropriate for this. Could I set up 2 separate queues using Huey and assign 1 worker to each so the sleep times in between are respected?
It shouldn't be necessary to ever scrape financial data unless you're doing sentiment analysis or something, as all that data is typically available for free or quite cheap, but that's besides the point I guess.
I'd suggest you do something like this:
@huey.task()
def scrape_ticker(t):
try:
download_the_data()
finally:
# Run this same task again in 10 minutes.
scrape_ticker.schedule(args=(t,), delay=600)
I was thinking of using Huey to schedule API calls, but I would need to have my tasks sleep a certain time for every different API I am accessing.
I think a solution would be to limit # of workers to 1, and then have the tasks sleep after they are finished. However, this will not account for different APIs which can be accessed simultaneously without a problem.
I thought maybe Huey has a more sophisticated approach to this case?