Open bkrumnow opened 3 years ago
For booking.com only hotel reservations are supported. So need location + checkindate
Its seems these data points have not been checked manually. For example, there are no flights available with Eurowings in May from PAR to HAM.
With the new corona wave maybe more flights will be canceled. To be safe maybe we need to focus on data >= June.
Added comparisons with new datapoints.
If you want to add more, first check if flights are available, and run ts-node cli.ts scrape to make sure it doesnt crash for the given date.
Its seems these data points have not been checked manually. For example, there are no flights available with Eurowings in May from PAR to HAM.
Actually, I did...Let's divide these between us and make sure the list provides valid input data. More on that tomorrow
I already added more input data, see pgadmin.
For EuroWings, only flight dates that return one result are allowed.
I wanted to check the data online, but it seems that I don't posses the right to execute queries or see any databases. Could you check it?
just checked with your username, I can do select on every table and also updates
updated the input data for EuroWings, suddenly there were no more flights
Its really terrible flights keep disappearing randomly https://scraperbox.be/screenshots/EuroWingsWebScraper-1617692472681.png
Let them expire. We will update them short before we start the data collection
I check
I already added more input data, see pgadmin.
For EuroWings, only flight dates that return one result are allowed. The list online is still pretty incomplete Let's take 3 different searches for each comparison, e.g. for booking.com, we want to have for each comparison (1. mobile vs desktop, 2. France vs German site, ...) the same input data: Madrid (MAD) - Warsaw (WAW), June, 14th 2021 Bordeaux (BOD) - Rome (FCO), May, 6th 2021 Porto (OPO) - Berlin (BER), May, 13th 2021 So, we do not rely on one single data point that may be flawed
Let's add the following connections:
Kayak Berlin (BER) - Barcelona (BCN), 13.08.2021 Madrid (MAD) - ROM (FCO), 07.08 2021 { "origin": "BER", "destination": "BCN", "departureDate": "2021-08-13" } { "origin": "MAD", "destination": "FCO", "departureDate": "2021-08-07" }
Booking: Bordeaux (BOD) - Rome, 13.08.2021 Porto (OPO) - Berlin (BER), 25.08.2021
Opodo: Cologne (CGN) - Prague (PRG), 23.08.2021 Porto (OPO) - Brussels (BRU), 18.08.2021 { "origin": "CGN", "destination": "PRG", "departureDate": "2021-08-23" } { "origin": "OPO", "destination": "BRU", "departureDate": "2021-08-18" } Expedia: Stockholm (ARN) - Amsterdam (AMS), 10.08.2021 Porto (OPO) - Brussels (BRU), 25.08.2021 { "origin": "AMS", "destination": "ARN", "departureDate": "2021-08-10" } { "origin": "OPO", "destination": "BRU", "departureDate": "2021-08-18" }
Airfrance: Madrid (MAD) - Paris (PAR), 25.08.2021 Vienna (VIE) - Amsterdam (AMS), 09.08.2021 { "origin": "MAD", "destination": "PAR", "departureDate": "25.08.2021" } { "origin": "VIE", "destination": "AMS", "departureDate": "2021-08-09" }
EuroWings: Cologne (CGN) - London (LON), 12.08.2021 Berlin (BER) - ROM (FCO), 23.08.2021 { "origin": "CGN", "destination": "LON", "departureDate": "2021-08-12" } { "origin": "BER", "destination": "FCO", "departureDate": "2021-08-23" }
We need a second version of booking.com that provides flights
See comparison table for the new additins
@godfriedmeesters Last things needed:
If you want to do the scraper for Booking flights, I guess best to look at BookingWebScraper.ts which scrapes only hotel rooms.
About the dataset, corrected a bug where apps returned duplicate offers, hopefully we have a good dataset next week.
Booking Flights is much more difficult than hotel offers. Only possible to query by xpaths.
XPATH selector works in chrome but not in puppeteer
//div[@data-testid='searchresults_card']//*[contains(text(),'€')]
gives prices in chrome
however this gives wrong prices let elements = await this.page.$x("//div[@data-testid='searchresults_card']//*[contains(text(),'€')]"); var txts = []; for (var elem of elements) { await this.page.waitFor(100); const price = await this.page.evaluate(el => el.textContent, elem); console.log(price); }
OUTPUT: 136,07 € 136,07 € 136,07 € [] []
I give up on Booking flights, too difficult
In Chrome devtools query //div[@data-testid='searchresults_card']//*[contains(text(),'€')] works well However, in puppeteer seems very tricky
@godfriedmeesters As discussed, we do not want to delay progress much further. Could please add two more distinct cities and dates, so that end up we three ( a. 3x mobile vs. web + b. 3x web vs. web; same input data for a and b) comparisons for booking.com?
Added new comparisons with different data.
Also changed the order in the comparisons table, so the same company will not be scraped consecutively
Added new comparisons with different data.
Also changed the order in the comparisons table, so the same company will not be scraped consecutively
Good move. I like it
Here are a couple of input data that we would like to add (keep in mind that some of these may lead to prices shown in different currencies).
Booking:
Kayak:
Opodo:
Expedia:
Airfrance:
EuroWings: