Closed bapti closed 3 years ago
Hi, I tried the following to see if it would work, but it doesn't seem to. So I'm just defaulting to triggering the crawls periodically and not utilising any of the job logic within Oban, purely using it as a cron mechanism for triggering the crawls.
def perform(%Oban.Job{args: args}) do
pid = self()
Crawly.Engine.start_spider(Spiders.MySpider,
on_spider_closed_callback: fn _spider_name, _crawl_id, reason ->
IO.inspect("Added crawly callback executed")
send(pid, {:crawly_finished, reason})
:ok
end
)
receive do
{:crawly_finished, reason} ->
IO.inspect("Crawl finished #{reason}")
reason
end
:ok
end
This method would mean that your oban job would wait until the crawl finishes, whcih might take quite a while. It isn't the best way, but its simple enough. Not too sure if there's a timeout for oban jobs, but this solution looks fine from my point of view.
Another possibility is to update the job status post crawl using the callback. So you complete the job and mark the crawl as incomplete, then update it afterwards as complete. This means that you store the status of the crawl in your db or something like that.
refer to updated comment.
Hi @Ziinc
I've a similar usecase
Do you know how we can pass metadata?
Work to add this had stalled. If you really need this you can contact me privately and I can see what I can do.
@Ziinc sent you an email to ty@tzeyiing.com
Hi, I'm using crawly and I want to trigger crawls from oban jobs. I'm thinking of wrapping the call to starting a spider in a Task, is this the right approach? It's mainly so I could do something like
await Spider
within the scheduled job. I've read through the docs as best I can, but I'm a little bit of a novice when it comes to these parts of elixir so thought I'd ask.Thanks for any help!