Closed dae closed 9 months ago
I have done a little digging using the two sample images from the forum thread. They fail for different reasons, neither of which looks like it would be fixed by a different user agent:
https://o.quizlet.com/LbvqR76u3UVrFNFPpF7gqA.jpg
This image downloads fine using curl
as long as any user agent is specified. The Anki user agent works fine. I have tried to replicate the failing (403) Anki get request using requests
, but couldn't get it to fail:
# Python 3.9
import requests
s = requests.Session()
url='https://o.quizlet.com/LbvqR76u3UVrFNFPpF7gqA.jpg'
headers = {'User-Agent': 'Anki 23.12.1'}
response = s.get(url, stream=True, headers=headers, timeout=30, verify=True)
print(response.status_code)
print(response.headers['Content-Type'])
# --> 200
# --> image/jpeg
Replicating Anki's exact request headers in curl didn't fail either.
Before the image URL is fetched, urllib unquotes it:
Resulting in a different URL that the server doesn't recognize.
Thanks for digging into these. For image 1, when I test with the debug console:
Maybe they are filtering based on IP, or some other header?
For image 2, hmm, that's troublesome - it would be nice to fix, if we can do so in a way that doesn't break other links.
The exact same sample script I provided above consistently produces different results when executed from the project pyenv, as opposed to a greenfield venv:
$ venv/bin/python image1.py
200
image/jpeg
$ ~/code/anki/out/pyenv/bin/python image1.py
403
text/html; charset=UTF-8
I coudln't figure out why:
requests
, the request headers are identical.I have sketched a solution in #2943.
I tried setting up mitmproxy to compare the traffic of the built-in and external Python versions, but found when Anki's built-in Python is pointed to mitmproxy, the issue goes away! It seems to be an issue with the Python version Anki is currently using. Updating to the latest python 3.9.x seems to fix it, so I'll do that when I have a chance.
This will increase compatibility when downloading images, e.g. https://forums.ankiweb.net/t/errors-when-pasting-certain-images/38732/3. Currently we fall back to remote links when a download fails, but they're likely to break in the future, and they are slower.