godfriedmeesters / scraper

As part of DiffScraper, one or more bots can be deployed. Ready-to-use bots are provided that can extract offers from mobile applications, mobile websites and desktop websites.
GNU General Public License v3.0
2 stars 0 forks source link

Careful when comparing best vs cheapest #26

Open godfriedmeesters opened 3 years ago

godfriedmeesters commented 3 years ago

Best vs cheapest was implemented for Kayak and Booking.com

However:

Conclusion: if you compare best with cheapest, your conclusions will likely be wrong

bkrumnow commented 3 years ago

Can we collect one batch of reliable data here or is a serious act to get this running?

godfriedmeesters commented 3 years ago

Its a serious act to get this running, in addition my laptop SSD harddisk that I used for development crashed this week, I replaced with a old and very slow disk.

If you want to look at it, here some ideas:

For Kayak I dont know how to select.

Now I select the price in the offer list with the CSS .price-text

However sometimes more than one price is shown in a Kayak offer and then the wrong price is selected.

For Booking, I sort by cheapest by doing this.clickElementByXpath('/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.ScrollView/android.widget.LinearLayout/android.widget.CheckedTextView[6]');

Hower depending on unpredicatble circumstances, this selector might not work every time.

Also for some offers, selector might be different than for other offers.

godfriedmeesters commented 3 years ago

For kayak website, I updated the CSS to select the price, normally now we should get the correct price all the time

godfriedmeesters commented 3 years ago

For Booking.com, my guess is that most offer lists are extracted correctly. However if you find different lists on app and web, please double check that one or both scrapers did not skip extracting any offers (which might explain a different total)

bkrumnow commented 3 years ago

I see two ways out.

  1. Fix the scraper and do another run.
  2. Have one manual run over all screenshots and fix the data
godfriedmeesters commented 3 years ago

Data from Booking App sorted by cheapest was not always sorted by cheapest, maybe because of performance problems of the phone, that's why I made a new commit https://github.com/godfriedmeesters/scraper/commit/7b0be52ba4d6da029a747927e101366f882a06a9

The data recorded up til now for Booking App "sorted by cheapest" is unusable.

The data recorded up til now for Kayak Desktop Web is also unusable

Hopefully with the corrections you will have good data, however for my thesis it's too late I think