Fix/scraped details - Githubissues

Resolves #44, resolves #34

Circumvent results

useFilters option is currently implemented using the following logic (assuming useFilters === true):

New start pages are enqueued in handleListPage function, pagination pages are enqueued only if total number of results per current start url is <= 1000. Otherwise filtered pages are enqueued for results count > 1000.
New filters are detected from unchecked checkboxes, they are mapped to the current url as new query parameters.
In each enqueuing phase triggered in handleListPage function, unchecked filters are iterated and each filter is interpreted as query parameter name. If the filter has multiple value choices, all values are iterated and new url is enqueued for each value.
Before a new filtered page is enqueued, it is checked against duplicate addition. All urls enqueued using filters are stored in state object and newly built url is checked against all stored urls. If an url with exactly same query parameter names is detected, the new url is not enqueued. Query parameter values don't have to match precisely in this url comparsion as all values of a given parameter are processed during 1 filter enqueuing phase.

Room info

Extraction of rooms info from detail page was added for unset checkIn and checkOut input attributes. Booking.com doesn't show room features directly inside rooms table without checkIn, checkOut set so it cannot be scraped effectively (I tried to expand room info using page.Click('.room-info [href]') combined with page.waitForSelector('.hprt-facilities-facility') (and a few other options) but the overhead was too big and a lot of timeouts were triggered. I added room info url to the output so it can be inspected if needed.)

Example room info:

{
      "url": "https://www.booking.com/hotel/us/zaza-dallas.cs.html?aid=304142;label=gen173nr-1FCAso7AFCC3phemEtZGFsbGFzSDNYBGhniAEBmAEFuAEYyAEM2AEB6AEB-AEGiAIBqAIEuAKYq_eNBsACAdICJGEyOWQzZmMwLTdmOTAtNDcxMS1iMTFiLTQyN2I0YjIxNjZiYdgCBeACAQ;sid=e3e5ca388d08ffa3d72b88094262cc35;dist=0&group_adults=2&group_children=0&hapos=22&hpos=22&keep_landing=1&nflt=review_score%3D84%3Bprice%3DUSD-150-200-1&no_rooms=1&req_adults=2&req_children=0&sb_price_type=total&sr_order=popularity&srepoch=1639830929&srpvid=da7758884dba0107&type=total&ucfs=1&#room_103228202",
      "roomType": "Deluxe Parlor Double",
      "bedType": "2 manželské postele",
      "persons": 2
}

Output schema

Output properties were updated for unset checkIn and checkOut input attributes. price, currency and persons properties were excluded as null was stored for each of them and resulting dataset was unnecessary bigger because of that.

dtrungtin / actor-booking-scraper

Fix/scraped details #51

Circumvent results

Room info

Output schema