alxcnwy commented 6 months ago

Hi,

First of all well done - awesome project!

I get the following error when I try to run the sample code:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[2], line 31
     28 print("The price is currently", result.current_price)
     30 # Display the first flight
---> 31 print(result.flights[0])

IndexError: list index out of range

result is Result(current_price='', flights=[])

plz halp :)

P.S. tried adding you on discord but your id isn't working. DM me on twitter (just followed you) - I'm looking to use this API for something fun :)

AWeirdDev commented 6 months ago

Hi there,

Were you trying to run the example code? If so, I've ran the code on different platforms and everything seems working as intended.

May I request for additional details on what airport filters did you add? They're inside of FlightData and are named from_airport and to_airport.

Given the result dataclass provided (Result(current_price='', flights=[])), it is possible that no flights were found based on the current filter. You can visualize the search on Google Flights to see if there's also no results.

P.S. are you making an ai integration? this project was made because i wanted to see if ai can search for flights, so here we are ;)

Cheers, AWeirdDev

alxcnwy commented 6 months ago

Hey I just ran the sample code from the readme with no changes which is what made me raise the issue 😅

On Thu, May 16, 2024 at 12:44 AM JC @.***> wrote:

Hi there,

Were you trying to run the example code? If so, I've ran the code on different platforms and everything seems working as intended.

May I request for additional details on what airport filters did you add? They're inside of FlightData and are named from_airport and to_airport.

Given the result dataclass provided (Result(current_price='', flights=[])), it is possible that no flights were found based on the current filter. You can visualize the search on Google Flights https://flights.google.com to see if there's also no results.

P.S. are you making an ai integration? this project was made because i wanted to see if ai can search for flights, so here we are ;)

Cheers, AWeirdDev

— Reply to this email directly, view it on GitHub https://github.com/AWeirdDev/flights/issues/1#issuecomment-2113602183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB3NS5AGU6K5T5J3LRKPZLZCPQMBAVCNFSM6AAAAABHXZLUSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGYYDEMJYGM . You are receiving this because you authored the thread.Message ID: @.***>

--

🔗 connect with me on linkedin https://www.linkedin.com/in/alxcnwy/

📅 schedule a call with me https://calendly.com/numberboost/short-call

🙌 follow me on twitter https://twitter.com/alxcnwy

📱 +2783 949 1917

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

AWeirdDev commented 6 months ago

Hi again,

That's weird, since on most clients the code works perfectly fine. We can do a little troubleshooting to better understand what's going on here.

Re-install dependencies. Sometimes this kind of issue is caused by selectolax. We can try re-installing and see if the error still persists.
```
$ pip uninstall selectolax -y
$ pip install -U selextolax
```

Generate a search visualization URL. We can use TFSData (from create_filter()) to get the Base64 string and print the URL.

filter = create_filter(...) # Your filter
b64 = filter.as_b64().decode('utf-8')
print(
"https://www.google.com/travel/flights?tfs=%s" % b64
)

Copy the URL and view the page on a web browser. If there's no flight data, then it proves that it might be a "region," "date," or "airport" issue. Try changing the parameters.

API. I can host a dedicated API if none of the above works. That way, no errors should persist on your side (we'll keep it to Vercel, though).

Best, AWeirdDev

alxcnwy commented 6 months ago

Working now, thanks!

IHannes commented 6 months ago

Hi,

i have the same issue but unfortunately none of the above mentioned fixes works. The link that is generated based on the filter is correct and works in a web browser but i don't get any results when trying to scrape.

Any help would be much appreciated!

Greetings from Germany

AWeirdDev commented 6 months ago

Hi there,

During the development of this project, I did get some uncaught errors when using selectolax to parse the HTML contents while the selectors, responses are all functional. I'll inspect the code now, and I'll keep you updated.

Best, AWeirdDev

IHannes commented 6 months ago

Hey AWeirdDev,

thank you so much for looking into it, i really appreciate all your effort! This is the output of the request_flights function, maybe it will help you.

Best regards

Hannes output.txt

AWeirdDev commented 6 months ago

Hey there,

Thanks for providing the HTML output! I did some digging based on your output and it seems like this line (from source):

https://github.com/AWeirdDev/flights/blob/978c70b2ec03307aef459ccd90b4e092510e4b43/fast_flights/core.py#L45

Seems to be not working. The main issue is that the parser cannot select div[jsname="IWWDBc"] (contains "best flights") and div[jsname="YdtKid"] (contains "other flights").

I copied the HTML output to an online HTML viewer, and this was what I got.

Google-Flights-Very-Googlish

And to prove that it's not injected by Google in runtime:

This is indeed the main reason!

A quick recap on what happened:

You didn't accept Google's Terms of Service
Google Flights page became a ToS clarification form therefore causing us unable to scrape
Call stack: line 45 failed to select items

Cheers, AWeirdDev

IHannes commented 6 months ago

Thank you so much for your Help!

I added cookies = { "CONSENT": "PENDING+987", "SOCS": "CAESHAgBEhJnd3NfMjAyMzA4MTAtMF9SQzIaAmRlIAEaBgiAo_CmBg" }

and

def request_flights(tfs: TFSData) -> requests.Response: r = requests.get( "https://www.google.com/travel/flights", params={ "tfs": tfs.as_b64(), "hl": "en", "tfu": "EgQIABABIgA", # show all flights and prices condition }, headers={"user-agent": ua, "accept-language": "en"}, cookies = cookies )

and it works now!!

AWeirdDev commented 6 months ago

Hi again,

That's great news. I've updated the project (v0.3) and now you can add custom **kwargs such as cookies to requests.get so there's no need to clone the source.

# tag: v0.3
get_flights(filter, cookies={…}, proxies={…}, ...)

Details

Commit: https://github.com/AWeirdDev/flights/commit/696885c9364d68f38b6563b1e6d61a1f4fd33705 Install v0.3: `pip install fast-flights==0.3`

Additionally, the cookie provided (CAESHAgBEhJnd3NfMjAyMzA4MTAtMF9SQzIaAmRlIAEaBgiAo_CmBg) is also a Protobuf string, so I'll plan to support bypassing this screen (ToS) when I have time.

Cheers, mate!

witoldprzygoda commented 6 months ago

First of all - thanks for sharing your work! Second - I tried all of the above and I get exactly the same error. 1) tried both with installed fast-flights and cloned repo 2) reinstalled selectolax 3) cookies = { "CONSENT": "YES+" } result = get_flights(filter, cookies=cookies) 4) the example I try is present on GF https://www.google.com/travel/flights?tfs=GhoSCjIwMjQtMDYtMDNqBRIDS1JLcgUSA1NaWUIBAUgBmAEC 5) the same with your genuine example https://www.google.com/travel/flights?tfs=GhoSCjIwMjQtMDctMDJqBRIDVFBFcgUSA01ZSkIDAQECSAGYAQI=

No prices result.flights is just an empty list

A bit of debug: def get_flights(tfs: TFSData, kwargs: Any) -> Result: print(tfs,kwargs) rs = request_flights(tfs, kwargs)

results = parse_response(rs)

return rs
#return results

I get TFSData('hello', flight_data=[FlightData(date='2024-06-03', from_airport=KRK, to_airport=SZY)]) {'cookies': {'CONSENT': 'YES+'}} then rs is just only <Response [200]>

AWeirdDev commented 6 months ago

Hey there,

Could you provide me the what requests returned? I'll need to inspect the HTML on your end -

from fast_flights.core import request_flights

r = request_flights(tfs, cookies=cookies)
with open(".html", "wb") as f:
  f.write(r.content)

...and please provide me with the created .html file.

Best, AWeirdDev

witoldprzygoda commented 6 months ago

Interesting... is it again about consent to Google Terms despite the fact "cookies" are approved? I see inside HTML Before you continue

AWeirdDev / flights

No results in sample code #1

results = parse_response(rs)