binux / pyspider

A Powerful Spider(Web Crawler) System in Python.
http://docs.pyspider.org/
Apache License 2.0
16.49k stars 3.69k forks source link

What is a spider system? #32

Closed remram44 closed 9 years ago

remram44 commented 9 years ago

Since this is the only description from GitHub and the README... What is a spider system? Google doesn't help...

Should be clearer in the README.

binux commented 9 years ago

It may be Chinglish... Maybe you can help me to correct it.

First all, pyspider is

Search Engine Spider commonly known as a Web crawler, an automated software agent that gathers pages from the World Wide Web http://en.wikipedia.org/wiki/Web_crawler

pyspider has a group of programs work together to crawl and process web data, that's what "system" means (maybe word misuse)

fagnercarvalho commented 9 years ago

This means that with this I can create my own web crawler?

binux commented 9 years ago

Yes. But pyspider is mainly designed for vertical search engine, differ from a general web crawler which would follow every links. That means you need to write a python script, teach pyspider what to crawl, what should been extracted.

fagnercarvalho commented 9 years ago

Oh, I see. Very nice! I will try on a personal project where I didn't quite know what to use to solve my problem until now.

Thanks @binux!