Open godfriedmeesters opened 3 years ago
Can we collect one batch of reliable data here or is a serious act to get this running?
Its a serious act to get this running, in addition my laptop SSD harddisk that I used for development crashed this week, I replaced with a old and very slow disk.
If you want to look at it, here some ideas:
For Kayak I dont know how to select.
Now I select the price in the offer list with the CSS .price-text
However sometimes more than one price is shown in a Kayak offer and then the wrong price is selected.
For Booking, I sort by cheapest by doing this.clickElementByXpath('/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.ScrollView/android.widget.LinearLayout/android.widget.CheckedTextView[6]');
Hower depending on unpredicatble circumstances, this selector might not work every time.
Also for some offers, selector might be different than for other offers.
For kayak website, I updated the CSS to select the price, normally now we should get the correct price all the time
For Booking.com, my guess is that most offer lists are extracted correctly. However if you find different lists on app and web, please double check that one or both scrapers did not skip extracting any offers (which might explain a different total)
I see two ways out.
Data from Booking App sorted by cheapest was not always sorted by cheapest, maybe because of performance problems of the phone, that's why I made a new commit https://github.com/godfriedmeesters/scraper/commit/7b0be52ba4d6da029a747927e101366f882a06a9
The data recorded up til now for Booking App "sorted by cheapest" is unusable.
The data recorded up til now for Kayak Desktop Web is also unusable
Hopefully with the corrections you will have good data, however for my thesis it's too late I think
Best vs cheapest was implemented for Kayak and Booking.com
However:
Conclusion: if you compare best with cheapest, your conclusions will likely be wrong