AWeirdDev / flights

Fast, robust Google Flights scraper (API) for Python. (Probably)
https://pypi.org/project/fast-flights
35 stars 9 forks source link

Hitting the cookie wall #15

Open aolieman opened 1 month ago

aolieman commented 1 month ago

The cookies and consent mechanism may not be working for all regions. I'm in the EU and even the first request done by request_flights hits the Google cookie wall.

I've dumped the HTML to a file and it's the "Before you continue to Google" page. This also happens when I add the headers generated by the Cookies class, and also when I substitute the SOCS cookie for the value taken from my browser.

Funnily enough, when I open the dumped HTML in my browser and click either "accept all" or "reject all", the desired search results are subsequently shown. I've been looking through other codebases that use the SOCS cookie to bypass the cookie wall, but haven't found any new approaches. Most still use the same mechanism that is used here, implying that it could still work from some IP address ranges.

I'm not the only one who can reproduce this. I've asked a team member in the same country and the results are exactly the same for him. Leaving this issue here in the hope that others from the EU can report whether the CONSENT and SOCS cookies still work for them.

AWeirdDev commented 1 month ago

Hi,

This has been a known issue for a while now, but since none of my servers are in the EU, I cannot directly test/fix the problem myself.

I guess that the SOCS cookie isn't the only one carrying the consent info here, and there may be session data or identifiers, like Cloudflare.

As you click on the "accept" button, it sends a POST request to consent.google.com/save, and some things are (sneakily) set. Under the hood, it uses an HTML <form>, meaning it technically can be easier for us to bypass the EU consent.

AWeirdDev