crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development
https://www.crwlr.software/packages/crawler
MIT License
312 stars 11 forks source link

Query params paginator and paginator improvements #122

Closed otsch closed 9 months ago

otsch commented 9 months ago

New QueryParamsPaginator to paginate by increasing and/or decreasing one or multiple query params, either in the URL or in the body of requests.

New method stopWhen in the new Crwlr\Crawler\Steps\Loading\Http\AbstractPaginator class. You can pass implementations of the new StopRule interface or custom closures to that method and then, every time the Paginator receives a loaded response to process, those stop rules are called with the response. If any of the conditions of the stop rules is met, the Paginator stops paginating. Of course also add a few stop rules to use with that new method: IsEmptyInHtml, IsEmptyInJson, IsEmptyInXml and IsEmptyResponse, also available via static methods: PaginatorStopRules::isEmptyInHtml(), PaginatorStopRules::isEmptyInJson(), PaginatorStopRules::isEmptyInXml() and PaginatorStopRules::isEmptyResponse().

Deprecate the Crwlr\Crawler\Steps\Loading\Http\PaginatorInterface and the Crwlr\Crawler\Steps\Loading\Http\Paginators\AbstractPaginator. Instead, added a new version of the AbstractPaginator as Crwlr\Crawler\Steps\Loading\Http\AbstractPaginator that can be used. Did this, because we're adding multiple new class properties and methods that could collide with properties and methods in user implementations. So to avoid a breaking change, moved to a new abstract class and to also not having an interface, because adding methods, when having an interface is always definitely breaking backwards compatibility.