crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development
https://www.crwlr.software/packages/crawler
MIT License
325 stars 11 forks source link

Limit Pagination Crawler #108

Closed flanderboy closed 1 year ago

flanderboy commented 1 year ago

Hello end thanks for the great library.

My question is, it's possible to limit the number of pages to get with pagination method?

For example, if I have 1000 pages in a listing articles, I need only the first 100 pages.

You have an example code?

otsch commented 1 year ago

Hi @flanderboy and thank you! 😊 Yes, I guess I could have documented it a little better. I'll probably try to improve the docs later. Here it says:

As the second argument, the paginate() method takes the maximum number of pages it will load. The default value if you don't provide a value yourself, is 1000.

So, you can customize the default limit of 1000 pages (the default is just to be sure you don't run into an infinite loop, but I guess it's good to know) like this:

$crawler
    ->input('https://www.example.com/some/listing')
    ->addStep(
        Http::get()->paginate('a.next-page-link', 100)
    );
flanderboy commented 1 year ago

Thanks :-D