BelgianNoise / colruyt-products-scraper

An application written in Go that scrapes Colruyt's API to retrieve all product listings.
https://colruyt-prijzen.nasaj.be/
5 stars 1 forks source link

Github Actions headless browser - Bot detection #6

Open BelgianNoise opened 5 months ago

BelgianNoise commented 5 months ago

Issue

The headless chrome browser doesn't seem to work in gh actions. For now I assume it gets served the bot prevention page from colruyt.

Notes

Is gh IP address' range blacklisted ?

Possible solution:

Configure go-rod to use SSL proxies as well.

BelgianNoise commented 5 months ago

Could not get go-rod to work with rotating proxies. Mayb need a static proxy (1 external IP) for this to work.

https://github.com/BelgianNoise/colruyt-products-scraper/commit/2dcff9fefea95ffcf81ffe69453a92c12c45f0a1 circumvents this issue for now. (Quick fix) Might look in to it later if this workaround stops working.