A news crawler for BBC News, Reuters and New York Times.
pip install -r requirements.txt
python bbc_crawler.py settings/bbc.cfg
python reuters_crawler.py reuters.cfg
python nytimes_crawler.py nytimes.cfg
Modify reuters.cfg
, nytimes.cfg
and bbc.cfg
in settings folder, the main configuration items may be start_date
, end_date
and path
.
If other news sources need to be added, just add files as the architecture, extend the basic class in each folder. Some methods may need to be rewrote.