Closed klausi closed 2 years ago
The Python HTTP client should be able to do whatever cURL is doing.
I have tried using the projects example.ods file while being VPNed to DE and AT and Yahoo seems to be working fine. So even if Yahoo fiddled with its EU cookie handling (quite likely) - how is the problem manifesting itself? What would I need to reproduce it?
The problem manifests in Libreoffice by not being able to calculate anything, for example for =GETREALTIME("EUR=X",21,"YAHOO")
. When executing the tests with python3 -m unittest discover src
I see test fails like this:
FAIL: test_realtime_UK_ETF (test_yahoo.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "Financials-Extension/src/test_yahoo.py", line 175, in test_realtime_UK_ETF
self.assertEqual(float, type(s), 'test_realtime_UK_ETF LAST_PRICE {}'.format(s))
AssertionError: <class 'float'> != <class 'NoneType'> : test_realtime_UK_ETF LAST_PRICE None
Then I looked into the downloaded HTML in ~/.financials-extension
and found that it contains the Yahoo cookie consent page instead of the target page.
When you are using the DE VPN and call the wget command from above - what response do you get?
If you compare the Yahoo page https://finance.yahoo.com/quote/EUR=X?p=EUR=X via a proxy site you can see that from EU servers you land on the cookie consent page while in the US you see the target price page https://eu7.proxysite.com/process.php?d=Kwg1zjk8tgVKD3k2tvLCCY65GbJH1%2BliRQo8gxe1cPrpZp5yC6YJ8RJyTA5llBkim2zqg2%2F9psjqcUjbcgxdtWze2JVeK5EwHLTdNZiC44T0XsDFfw42OLUA7s5UzBFNMX2x&b=1 https://us1.proxysite.com/process.php?d=Kwg1zjw6thdBAmg2tvLCCY65GbJH1%2B4lBR02wD6FQbPyNoA8K50v8zk%3D&b=1&f=norefer
I think there are 2 things going on at the Yahoo site:
Maybe your VPN test did not work correctly? Did you clear all cookies in your browser before testing from an EU IP address?
Thanks for checking in any case - love your extension. With this hack workaround I got it at least working again now for me :)
Got it! I wasn't running unit tests from command line. But once I did and was using VPNs to DE/AT I could reproduce the problem.
The extension uses a set of hard coded EU consent cookies that needed updating. The reason this is done instead of doing two/three HTTP round trips is to keep the network overhead to a minimum. That is better than spawning a separate curl process as well.
BTW: From AT you may be better off using the FT lookups - they may be closer to you than Yahoo (assuming Yahoo is hosted in US)
Once I updated the cookies with fresh ones while on VPN, the unit tests passed even when connect to a bunch of EU countries.
You can update your repo and/or use the new 3.0.6 release available from GitHub.
Thank the heavens for NordVPN :-)
3.0.6 works, thanks a lot!
I see you updated a different set of cookie combination than I tried, glad it works! I was just pissed off that a cURL request from the command line would give me the desired page, so did not try further and shoved the subprocess call in :-D
FT lookups: Will try them next time when I see problems with Yahoo again!
Problem: Yahoo now displays a EU cookie consent page and the scraping does not work anymore (at least for me in Europe).
The behavior of yahoo is really strange, because it seems to check the User agent of the request. This works without any cookies:
Then test.html contains the correct price page.
Wget does not work and returns a 404???
But then when I fake the User agent to look like cURL it works:
So it seems like with a user agent that looks like cURL you can bypass the cookie consent wall.
I tried that with the urlopen() call in Python, but setting the cURL user agent did not work for me. As a workaround hack I just called the cURL binary from Python and then the prices work normally again for me.
How can you configure the Python HTTP client to behave the same way as CURL here?