Open Michael-Getzinger opened 7 years ago
I need to crawl just a subdirectory of a website.
In "main.py" I changed "HOMEPAGE" to the following:
HOMEPAGE = 'http://www.example.com/subdirectory/'
After I ran "main.py" I noticed there is a link in "queue.txt" that goes to "www.example.com" (not the subdirectory).
How could I alter the code to only crawl the subdirectory?
Thanks!
The project has been moved to https://github.com/AbdulSheikh/Spider . This project is no longer supported In this github account.
I need to crawl just a subdirectory of a website.
In "main.py" I changed "HOMEPAGE" to the following:
After I ran "main.py" I noticed there is a link in "queue.txt" that goes to "www.example.com" (not the subdirectory).
How could I alter the code to only crawl the subdirectory?
Thanks!