dtrungtin / actor-airbnb-scraper

Airbnb Scraper actor is designed to extract most of publicly available data for home listings
https://apify.com/dtrungtin/airbnb-scraper
28 stars 22 forks source link

Actor will say succeeded when there was an error and it delivered 0 results and took a few minutes on first run, then usually will run subsequent runs and return results rapidly #30

Closed herewegoletsgo closed 3 years ago

herewegoletsgo commented 3 years ago

I have a customer facing application which requires the actor to run successfully first time as I am not running the scraper. However I've found a pattern where often the first run will dispaly as succeeded even though there was an error. This seems to happen on the first run only, the subsequent runs are usually ok if there is no long break inbetween. Almost like it needs to warm up or send a first attempt??

Not sure. The run below comes back with succeeded, even though there is an error returned and no data

2021-07-17T11:29:00.605Z ACTOR: Pulling Docker image from repository. 2021-07-17T11:29:04.541Z ACTOR: Creating Docker container. 2021-07-17T11:29:05.039Z ACTOR: Starting Docker container. 2021-07-17T11:29:06.627Z 2021-07-17T11:29:06.628Z > airbnb-scraper@1.0.1 start /usr/src/app 2021-07-17T11:29:06.630Z > node main.js 2021-07-17T11:29:06.632Z 2021-07-17T11:29:07.581Z INFO System info {"apifyVersion":"0.21.12","apifyClientVersion":"0.6.0","osType":"Linux","nodeVersion":"v12.18.4"} 2021-07-17T11:29:07.583Z WARN You are using an outdated version (0.21.12) of Apify SDK. We recommend you to update to the latest version (1.3.1). 2021-07-17T11:29:07.585Z Read more about Apify SDK versioning at: https://help.apify.com/en/articles/3184510-updates-and-versioning-of-apify-sdk 2021-07-17T11:29:07.814Z INFO "startUrls" is being used, the search will be ignored 2021-07-17T11:29:09.663Z INFO Starting with 1 urls 2021-07-17T11:29:09.999Z INFO BasicCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}} 2021-07-17T11:30:05.071Z ERROR Could not get detail for home {"url":"https://api.airbnb.com/v2/pdp_listing_details/33646460?_format=for_native"} 2021-07-17T11:30:05.073Z Error: Could not get data for: https://api.airbnb.com/v2/pdp_listing_details/33646460?_format=for_native 2021-07-17T11:30:05.075Z at getData (/usr/src/app/src/index.js:93:27) 2021-07-17T11:30:05.076Z at processTicksAndRejections (internal/process/task_queues.js:97:5) 2021-07-17T11:30:05.255Z INFO BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down. 2021-07-17T11:30:05.491Z INFO BasicCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":55139,"requestsFinishedPerMinute":1,"requestsFailedPerMinute":0,"requestTotalDurationMillis":55139,"requestsTotal":1,"crawlerRuntimeMillis":55827,"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1]} 2021-07-17T11:30:05.604Z INFO Crawler finished.

The subsequent runs usually come back ok.

2021-07-17T11:31:24.892Z ACTOR: Pulling Docker image from repository. 2021-07-17T11:31:29.615Z ACTOR: Creating Docker container. 2021-07-17T11:31:30.145Z ACTOR: Starting Docker container. 2021-07-17T11:31:31.994Z 2021-07-17T11:31:31.997Z > airbnb-scraper@1.0.1 start /usr/src/app 2021-07-17T11:31:31.999Z > node main.js 2021-07-17T11:31:32.003Z 2021-07-17T11:31:33.176Z INFO System info {"apifyVersion":"0.21.12","apifyClientVersion":"0.6.0","osType":"Linux","nodeVersion":"v12.18.4"} 2021-07-17T11:31:33.179Z WARN You are using an outdated version (0.21.12) of Apify SDK. We recommend you to update to the latest version (1.3.1). 2021-07-17T11:31:33.181Z Read more about Apify SDK versioning at: https://help.apify.com/en/articles/3184510-updates-and-versioning-of-apify-sdk 2021-07-17T11:31:33.432Z INFO "startUrls" is being used, the search will be ignored 2021-07-17T11:31:35.194Z INFO Starting with 1 urls 2021-07-17T11:31:35.265Z INFO BasicCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}} 2021-07-17T11:31:35.945Z INFO Saving home detail - 33646460 2021-07-17T11:31:36.373Z INFO BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down. 2021-07-17T11:31:36.654Z INFO BasicCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":725,"requestsFinishedPerMinute":41,"requestsFailedPerMinute":0,"requestTotalDurationMillis":725,"requestsTotal":1,"crawlerRuntimeMillis":1459,"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1]} 2021-07-17T11:31:36.742Z INFO Crawler finished.

dtrungtin commented 3 years ago

It should be related to network error.