crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development
https://www.crwlr.software/packages/crawler
MIT License
312 stars 11 forks source link

Enable the use of Proxies #120

Closed otsch closed 10 months ago

otsch commented 10 months ago

Add new methods HttpLoader::useProxy() and HttpLoader::useRotatingProxies([...]) to define proxies that the loader shall use. They can be used with a guzzle HTTP client instance (default) and when the loader uses the headless chrome browser. Using them when providing some other PSR-18 implementation will throw an exception. (see https://github.com/crwlrsoft/crawler/issues/99)

Also, fix the HttpLoader::load() implementation won't throw any exception, because it shouldn't kill a crawler run. When you want any loading error to end the whole crawler execution HttpLoader::loadOrFail() should be used. Also adapted the phpdoc in the LoaderInterface.