Crawling Domain Subdirectory

buckyroberts / Spider

Python website crawler.

https://thenewboston.com/

969 stars 666 forks source link

Open Michael-Getzinger opened 7 years ago

Michael-Getzinger commented 7 years ago

I need to crawl just a subdirectory of a website.

In "main.py" I changed "HOMEPAGE" to the following:

 HOMEPAGE = 'http://www.example.com/subdirectory/'

After I ran "main.py" I noticed there is a link in "queue.txt" that goes to "www.example.com" (not the subdirectory).

How could I alter the code to only crawl the subdirectory?

Thanks!

ghost commented 7 years ago

The project has been moved to https://github.com/AbdulSheikh/Spider . This project is no longer supported In this github account.