JonasCz / How-To-Prevent-Scraping

The ultimate guide on preventing Website Scraping
1.47k stars 134 forks source link

How to identify the IP addresses from cloud hosting #5

Open cyflhn opened 5 years ago

cyflhn commented 5 years ago

You said in the article that we can block some requests from cloud hosting. But how should we identify the IP addresses which belong to cloud hosting?

JonasCz commented 5 years ago

Some cloud providers publish the IP ranges they use, e.g. AWS: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

Maxmind also provides an API which will tell you if an IP address is a VPN / Proxy / Web host / TOR node, etc: https://dev.maxmind.com/geoip/geoip2/geoip2-anonymous-ip-csv-database/ (paid)

Otherwise do a whois on the IP adress and extract the owner info - this is manual, but you can block networks which only do cloud / hosting.

cyflhn commented 5 years ago

whois API is not free. Is there any free API that can help us to identify the cloud hosting?

lnfel commented 3 years ago

I am kinda late for this but I found this great web tool for tracing back ip adress and doing whois: https://mxtoolbox.com/NetworkTools.aspx

You still need to manually filter ips accessing to your site, once your done with that. Use the tool to figure out which is which.