Open jancurn opened 5 years ago
Actually, since the underlying storage is not read-after-write consistent, calling addRequest
and getRequest
immediately after that might return null
, and thus cause weird bugs. I'm flagging this as bug then.
This might also be the cause of this problem:
2019-02-14T11:59:46.283Z ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.example.com/","retryCount":1} (error details: type=record-not-found, statusCode=404)
2019-02-14T11:59:46.286Z ApifyClientError: Record was not found
2019-02-14T11:59:46.288Z at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.291Z at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.294Z at <anonymous>
2019-02-14T11:59:46.296Z at process._tickCallback (internal/process/next_tick.js:189:7)
2019-02-14T11:59:46.298Z ERROR: BasicCrawler: runTaskFunction error handler threw an exception. This places the RequestQueue into an unknown state and crawling will be terminated. This most likely happened due to RequestQueue being overloaded and unable to handle Request updates even after exponential backoff. Try limiting the concurrency of the run by using the maxConcurrency option. (error details: type=record-not-found, statusCode=404)
2019-02-14T11:59:46.300Z ApifyClientError: Record was not found
2019-02-14T11:59:46.302Z at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.303Z at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.305Z at <anonymous>
2019-02-14T11:59:46.307Z at process._tickCallback (internal/process/next_tick.js:189:7)
2019-02-14T11:59:46.309Z ERROR: AutoscaledPool: runTaskFunction failed. (error details: type=record-not-found, statusCode=404)
2019-02-14T11:59:46.311Z ApifyClientError: Record was not found
2019-02-14T11:59:46.313Z at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.315Z at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.317Z at <anonymous>
2019-02-14T11:59:46.319Z at process._tickCallback (internal/process/next_tick.js:189:7)
2019-02-14T11:59:46.382Z User function threw an exception:
2019-02-14T11:59:46.388Z ApifyClientError: Record was not found
2019-02-14T11:59:46.390Z at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.392Z at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.394Z at <anonymous>
2019-02-14T11:59:46.396Z at process._tickCallback (internal/process/next_tick.js:189:7)
Just a note that the RequestQueue
should support the use case where one actors writes to the queue and another one is reading from it. Perhaps the cache should be used only if it's less than N seconds old, and afterwards we can just use underlying storage.
This shouldn't cause any problem and can greatly improve performance. See TODO at https://github.com/apifytech/apify-js/blob/master/src/request_queue.js#L276