cermak-petr / actor-booking-scraper

Apify actor for extracting data about hotels from Booking.com.
Apache License 2.0
11 stars 13 forks source link

Some urls doesn't return all results #1

Closed VaclavRut closed 5 years ago

VaclavRut commented 5 years ago

For example this input:

{
  "simple": false,
  "useFilters": false,
  "search": "",
  "sortBy": "bayesian_review_score",
  "maxPages": 150,
  "checkIn": "",
  "checkOut": "",
  "rooms": 1,
  "adults": 2,
  "children": 0,
  "currency": "EUR",
  "language": "en-gb",
  "proxyConfig": {
    "useApifyProxy": true
  },
  "startUrls": [
    "https://www.booking.com/searchresults.en-gb.html?aid=304142&label=gen173bo-1FCAQoggI4kgRICVgDaA-IAQGYAQm4AQfIAQzYAQHoAQH4AQOIAgGYAgKoAgM&sid=3c52be4f8f937b70a077fb8543cd6222&tmpl=searchresults&ac_click_type=b&ac_position=0&class_interval=1&dest_id=-1456928&dest_type=city&from_sf=1&group_adults=2&group_children=0&iata=PAR&label_click=undef&nflt=ht_id%3D204%3Bht_id%3D208%3Bht_id%3D203%3Bht_id%3D216%3Bht_id%3D206%3B&no_rooms=1&percent_htype_hotel=1&raw_dest_type=city&room1=A%2CA&sb_price_type=total&search_selected=1&shw_aparth=1&slp_r_match=0&src=index&srpvid=a11630f84a00006b&ss=Paris%2C%20Ile%20de%20France%2C%20France&ss_raw=Paris&ssb=empty&rows=50"
  ]
}

Expected number of results 1800 was scraoed: 800

Possible issue is that in the log we see that there are links like:

https://www.booking.com/searchresults.en-gb.html?aid=304142&label=gen173bo-1FCAQoggI4kgRICVgDaA-IAQGYAQm4AQfIAQzYAQHoAQH4AQOIAgGYAgKoAgM&sid=3c52be4f8f937b70a077fb8543cd6222&tmpl=searchresults&ac_click_type=b&ac_position=0&class_interval=1&dest_id=-1456928&dest_type=city&from_sf=1&group_adults=2&group_children=0&iata=PAR&label_click=undef&nflt=ht_id%3D204%3Bht_id%3D208%3Bht_id%3D203%3Bht_id%3D216%3Bht_id%3D206%3B&no_rooms=1&percent_htype_hotel=1&raw_dest_type=city&room1=A%2CA&sb_price_type=total&search_selected=1&shw_aparth=1&slp_r_match=0&src=index&srpvid=a11630f84a00006b&ss=Paris%2C%20Ile%20de%20France%2C%20France&ss_raw=Paris&ssb=empty&rows=50&selected_currency=EUR&changed_currency=1&top_currency=1&lang=en-gb&group_adults=2&no_rooms=1&rows=20&offset=1820

Where are no restuls.

Or is it caused by proxies?

Log

https://api.apify.com/v2/logs/diAr3TptzShXvA9wN
cermak-petr commented 5 years ago

In case you need more than 1000 results, you have to use the "useFilters" INPUT attribute, in that case the actor will utilize the various criteria filters on the left side of the page (3 stars, hotels only, free wi-fi etc.) to overcome Booking's limit for 1000 results. However in that case any of those filters that are set in the startURLs will not be applied. If you need to use your own filters and need more than 1000 results, the solution would be to simply delete the filters from the url, download all results with "useFilters" enabled and then filter out the results you need in post-processing.