MattMoony / d4v1d

Social-Media OSINT tool - gather info on users across multiple platforms; easily extensible by design. 📷
https://m4ttm00ny.xyz/d4v1d
GNU General Public License v3.0
42 stars 6 forks source link

circumvent fingerprinting #8

Open MattMoony opened 1 year ago

MattMoony commented 1 year ago

description

Try to prevent platforms from rate-limiting bots (especially anonymous ones) by all available means. Probably a good idea to switch up HTTP headers on every other request, but also do more than that. Client fingerprinting shouldn't be the biggest issue, however, since that basically relies on JavaScript, afaik, and that's not really applicable to how d4v1d bots should normally gather data.

references

MattMoony commented 1 year ago

To get better control of lower-level connection parameters (TLS & HTTP/2) - perhaps taking a look at something like PyCurl especially in combination with curl-impersonate is a good idea.

8twinni8 commented 1 year ago

A rotating proxy functionality would also be great.

MattMoony commented 1 year ago

Found curl_cffi in a discussion about PyCurl integration for curl-impersonate - looks like a rather promising project. Going to try and base a sort of "anonymous session" class upon it.

Edit: Found a blog post (curl_cffi: A Python library that supports natively simulated browser TLS/JA3 fingerprinting) by the author of curl_cffi.

MattMoony commented 1 year ago

It's still not enough; need to do more research on how the "anonymous" session can still be identified as I'm still getting rate limited using the code base at commit (ac0303e3e011db1825aad5b0b018bedf1487a652) with AnonSession, etc.

MattMoony commented 1 year ago

Recommendation at the moment: Use a virtual machine / enforce IPv4, it could very well be that platforms like Instagram are more likely to block IPv6 addresses, as they should be assigned to exactly one device, whereas IPv4 addresses are commonly NATed, and therefore might actually have several clients behind them => they're probably a little more reluctant, when it comes to blocking those.

Edit: Nvm, I can fetch the site in a virtual machine using the exact same IPv6 address as my host machine, if I have been rate-limited on the host...