Closed HonzaKirchner closed 1 year ago
Huh, that's a good point - this might be the same issue we encounter when a migration happens. Perhaps we can store the total number of enqueued links in the crawler.stats
? This should fix the inconsistencies (the number of processed requests is also loaded from the stats), should be durable enough in case of migration / graceful abort, and seems to me like the path of least resistance for this feature right now.
@B4nan am I missing something in the bigger scale (e.g. is there a major way to enqueue links without enqueueLinks
)?
enqueueLinks
uses RequestQueue.addRequests
, which is also used in crawler.addRequests
so fixes should be implemented on the RequestQueue
level ideally, to cover all the code paths. Otherwise sounds good to me.
Additional discussion here: https://apifier.slack.com/archives/C0L33UM7Z/p1696325864681189
Lukáš K. If user aborts without the "graceful" option, there is no way you can correctly persist the state :confused: But we can load the info from the queue at the start of the actor (edited)
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
The status message crawling state doesn't persist abort (
Crawled 1973/133 pages
). Unfortunately, there is nothing we can do if user does immediate abort but we can pull the request queue state at the start to sync the state. This is quite prominently displayed so we should probably do this extra step.Motivation
The issue was reported here
Ideal solution or implementation, and any additional constraints
🤔
Alternative solutions or implementations
No response
Other context
No response