HexNio / booking_scraper

A booking.com Web Scraper for Data Mining/Harvesting and Automation
GNU General Public License v3.0
32 stars 24 forks source link

It does not work but... #4

Open jumpjack opened 2 years ago

jumpjack commented 2 years ago

It does not work (anymore?), but this script works in extracting data from html results:

resultsCount = document.querySelector("#right").children[0].children[0].children[0].children[1].children[0].innerHTML;
mainResult = document.querySelector("#search_results_table");
actualResults = mainResult.children[1].children[0].children[0].children[0].children[2];
res = [];
resultsArr = [...actualResults.children];
res = [];
resultsArr.forEach((result) => {
    if (result.getAttribute("data-testid")) {
        if (result.getAttribute("data-testid") == "property-card") {
            try {
                name=result.children[0].children[1].children[0].children[0].children[0].children[0].children[0].children[0].children[0].children[0].children[0].innerHTML
            } catch(e) {
                name = "n/a";
            }

            priceBase=result.children[0].children[1].children[0].children[1].children[1].children[0].children[0].children[0].children[0].children[1].children[0];
            try {
                price=priceBase.children[0].children[0].innerHTML;
            } catch(e) {
                try {
                    price=priceBase.children[0].innerHTML;
                } catch(e) {
                    try {
                        price=priceBase.innerHTML;
                    } catch(e) {
                        price = "n/a";
                    }
                }
            }

            price = price.replace(" ","");
            try {
                type=result.children[0].children[1].children[0].children[1].children[0].children[0].children[1].children[0].children[0].children[0].innerHTML;
            } catch(e) {
                type = "n/a";
            }

            try {
                rank=result.children[0].children[1].children[0].children[0].children[1].children[0].children[0].children[0].children[0].children[0].innerHTML;
            } catch(e) {
                rank = "n/a";
            }

            res.push({name: name, price: price, type: type, rank:rank})
        }
    }
});
console.log(res);

https://www.booking.com/searchresults.it.html?checkin_month=11&checkin_monthday=9&checkin_year=2022&checkout_month=11&checkout_monthday=11&checkout_year=2022&group_adults=3&group_children=0&order=price&ss=Rome%2C%20Italy&offset=26&nflt=mealplan%3D1%3Boos%3D1%3Breview_score%3D80

Additional filters: Use &nflt= followed by a string of "parameter%3dvalue" , each parameter separated by %3b (=";"); example: &nflt=mealplan%3D1%3Boos%3D1%3Breview_score%3D80

jumpjack commented 2 years ago

JSON result on a map: https://www.booking.com/markers_on_map?aid=304142&aid=304142&dest_id=-130358&dest_type=&sr_id=&ref=searchresults&limit=100&stype=1&lang=it&ssm=1&checkin=2022-12-29&checkout=2023-01-01&sech=1&ngp=1&room1=A%2CA%2CA&ugr=1&maps_opened=1&nsopf=1&nsobf=1&esf=1&nflt=mealplan%3D1%3Breview_score%3D80%3Bfc%3D2&sr_countrycode=it&sr_lat=&sr_long=&sgh=1&dba=1&dbc=1&spr=1&currency=EUR&&shws=1%20&huks=1&somp=1&mdimb=1%20&tp=1%20&img_size=270x200%20&avl=1%20&nor=1%20&spc=1%20&rmd=1%20&slpnd=1%20&sbr=1&at=1%20&sat=1%20&ssu=1&srocc=1&order=price;BBOX=13.673105411340503,41.75777869190572,14.244394473840503,42.36949079906152&_=1667554522637

Assign the result to "h" to get a simplified list:

h.b_hotels.forEach((hotel) => {console.log(hotel.b_hotel_title, hotel.b_accommodation_type, hotel.b_review_score + "(" + hotel.b_review_nr + ")" , hotel.b_marker_type , hotel.b_u_total_price, "dist=" + (Math.sqrt(Math.pow(centerlat*1-hotel.b_latitude,2) + Math.pow(centerlon*1-hotel.b_longitude,2))*111).toFixed(0)) });

From hotel.b_marker_type you can understand if location is available or not; if it is, also its price (hotel.b_u_total_price) is available.

Given this value for "boundin box" (BBOX):

bbox="13.673105411340503,41.75777869190572,14.244394473840503,42.36949079906152"

You can get center value by:

coords=bbox.split(",");
lat1=coords[1]*1;
lon1=coords[0]*1;
lat2=coords[3]*1;
lon2=coords[2]*1;
centerlat = lat1 + (lat2-lat1)/2;
centerlon = lon1 + (lon2-lon1)/2;

And hotel distance from center by:

    Math.sqrt(Math.pow(centerlat*1-h.b_hotels[HOTEL_NUM].b_latitude,2) + Math.pow(centerlon*1-h.b_hotels[HOTEL_NUM].b_longitude,2))*111