celebi-pkg / flight-analysis

Python package to scrape flight data from Google Flights and analyzes prices. Can determine optimal flight from date, place, and price
https://kcelebi.github.io/flight-analysis/
MIT License
114 stars 36 forks source link

num_flights #7

Open whatevernevermindbro opened 1 year ago

whatevernevermindbro commented 1 year ago

Hi Kaya! Version 1.1 looks great!

I want to suggest an option to parse a fixed number of flight options at once. Currently, the received number of flights is limited.

Here is an example, let's say we will take SFO - IST, from Aug 1 to Aug 17. If you search for it you can actually see that there are more than 120 flights in the case of a one-way ticket and more than 70 flights in the case of a round-trip, while the Scrape returns only 21 flights in total.

I am guessing that the current version of the scraper returns only the flights from the front page. In that case, I would suggest making it possible to push this "X more flights" button with Selenium in order to get the needed number of flights in the same window.

image

On the other hand, it might be not necessary to make it a number of flights against making it a useful set of filters like date/time of arrival, time of the flight, etc.

Hope that is interesting. Have an amazing week!

smyja commented 1 year ago

+1

kevinsimard commented 4 months ago

Any update on this? All flights are now queried correctly or still returns only the first 21 flights?

haciMMicah commented 2 months ago

If anyone is still interested in this, the class name of the div containing the button is ZVk93d which only contains a single button so finding the button element and clicking it is done pretty simply with something like

more_flights_div = driver.find_element(
    by=By.CLASS_NAME,
    value="ZVk93d",
)
more_flights_button = more_flights_div.find_element(
    by=By.CSS_SELECTOR, value="button"
)
more_flights_button.click()

From there, since you know how many elements to wait for you just need to wait for them. Then each flight's info is in another list item with class name pIav2d so you can just find all flight info elements after clicking the button via

driver.find_elements(by=By.CLASS_NAME, value="pIav2d")

It might take some changes to the parsing code but you probably can just drop this into _get_flight_elements and have it just work. since the .text items should be the same.

Hope this helps.