Symfony bundle for Roach PHP.
Roach is a complete web scraping toolkit for PHP. It is
a shameless cloneheavily inspired by the popular Scrapy package for Python.
The Symfony bundle mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. To learn about how to actually start using Roach itself, check out the rest of the documentation.
Add nelexa/roach-php-bundle
to your composer.json file:
composer require nelexa/roach-php-bundle
Bundle version | roach-php/core version | Symfony version | PHP version(s) |
---|---|---|---|
0.3.0 | 0.3.0 | ^5.3 | ^6.0 | >= 8.0 |
1.0.0 | ^1.0.0 | ^6.0 | >= 8.0 |
1.1.0 | 1.1.* | ^6.0 | >= 8.0 |
Register bundle into config/bundles.php (Flex did it automatically):
return [
//...
\Nelexa\RoachPhpBundle\RoachPhpBundle::class => ['all' => true],
];
The Symfony bundle of Roach registers a few console commands to make out development experience as pleasant as possible.
php bin/console roach:run
After that, you will get the entire list of available spiders.
Choose a spider class:
[0] App\Spider\GoogleSpider
[1] App\Spider\FacebookSpider
[2] App\Spider\TwitterSpider
Simply select the desired spider (▼ or ▲) or enter its number and press Enter.
You can pass as the first argument the name spider class name to run or its alias.
For example, if you have a class App\Spider\GoogleSpider
, then you can pass the following aliases: GoogleSpider
, google_spider
or google
.
php bin/console roach:run google
Sometimes it is useful to override the number of concurrent requests and the pre-request delay. To do this, you can pass the --concurrency
and --delay
options.
php bin/console roach:php google --concurrency 8 --delay 2
These options override the $concurrency
and $requestDelay
public properties of your spider.
Add the --output
(-o
) option and you can save the collected data to a JSON file.
php bin/console roach:php google --output 'path/to/data.json'
Roach ships with an interactive shell (often called Read-Evaluate-Print-Loop, or Repl for short) which makes prototyping our spiders a breeze. We can use the provided roach:shell
command to launch a new Repl session.
php bin/console roach:shell "https://roach-php.dev/docs/introduction"
First install Symfony MakerBundle
.
composer require --dev symfony/maker-bundle
php bin/console make:roach:spider
php bin/console make:roach:extension
php bin/console make:roach:item:processor
php bin/console make:roach:middleware:downloader:request
php bin/console make:roach:middleware:downloader:response
php bin/console make:roach:middleware:spider:item
php bin/console make:roach:middleware:spider:request
php bin/console make:roach:middleware:spider:response
Changes are documented in the releases page.
The MIT License (MIT). Please see LICENSE for more information.