Open MattMoony opened 1 year ago
To get better control of lower-level connection parameters (TLS & HTTP/2) - perhaps taking a look at something like PyCurl especially in combination with curl-impersonate is a good idea.
A rotating proxy functionality would also be great.
Found curl_cffi in a discussion about PyCurl integration for curl-impersonate
- looks like a rather promising project. Going to try and base a sort of "anonymous session" class upon it.
Edit: Found a blog post (curl_cffi: A Python library that supports natively simulated browser TLS/JA3 fingerprinting) by the author of curl_cffi.
It's still not enough; need to do more research on how the "anonymous" session can still be identified as I'm still getting rate limited using the code base at commit (ac0303e3e011db1825aad5b0b018bedf1487a652) with AnonSession
, etc.
Recommendation at the moment: Use a virtual machine / enforce IPv4, it could very well be that platforms like Instagram are more likely to block IPv6 addresses, as they should be assigned to exactly one device, whereas IPv4 addresses are commonly NATed, and therefore might actually have several clients behind them => they're probably a little more reluctant, when it comes to blocking those.
Edit: Nvm, I can fetch the site in a virtual machine using the exact same IPv6 address as my host machine, if I have been rate-limited on the host...
description
Try to prevent platforms from rate-limiting bots (especially
anonymous
ones) by all available means. Probably a good idea to switch up HTTP headers on every other request, but also do more than that. Client fingerprinting shouldn't be the biggest issue, however, since that basically relies on JavaScript, afaik, and that's not really applicable to howd4v1d
bots should normally gather data.references