Fetch Googlebot IP ranges from their published JSON resource

alaz / legitbot

🤔 Is this Web request from a real search engine🕷 or from an impersonating agent 🕵️‍♀️?

Other

21 stars 9 forks source link

Google publishes the current IP ranges for Googlebot: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot#automatic

Of course Legitbot could fetch them with fetch:url, similarly to how it works for Ahrefs:

https://github.com/alaz/legitbot/blob/e5c8923cc9c00459b426a1c4a1f89da87875b5d3/lib/legitbot/ahrefs.rb#L6-L7

But we don't know the cadence of changes to this list and fetch:url updates the Legitbot sources. Even with the automatic detection in place, the change would have to wait until the next release.

In order to dynamically fetch Googlebot IP ranges from their published JSON, ip_ranges block can be used, similarly to how it works for Facebook:

https://github.com/alaz/legitbot/blob/e5c8923cc9c00459b426a1c4a1f89da87875b5d3/lib/legitbot/facebook.rb#L10-L19

We probably need fetch:url factored out from Rubocop cop sources though, so it can be easily accessible.

alaz / legitbot

Fetch Googlebot IP ranges from their published JSON resource #142