LuChang-CS / news-crawler

A news crawler for BBC News, Reuters and New York Times.
108 stars 40 forks source link

Reuters error #25

Closed moonSandra closed 1 year ago

moonSandra commented 1 year ago

Hi LuChang, I tried to run python reuters_crawler.py reuters.cfg but I got this error and I don't know how to fix it.

Traceback (most recent call last):
  File "/home/Downloads/news-crawler-master/nytimes_crawler.py", line 14, in <module>
    nytime_article_fetcher = NytimeArticleFetcher(config)
  File "/home/Downloads/news-crawler-master/article/nytimes_article.py", line 15, in __init__
    super(NytimeArticleFetcher, self).__init__(config)
  File "/home/Downloads/news-crawler-master/article/darticle.py", line 22, in __init__
    self._mkdir(self.path,
  File "/home/Downloads/news-crawler-master/article/darticle.py", line 36, in _mkdir
    os.makedirs(path)
  File "/home/anaconda3/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: ''
moonSandra commented 1 year ago

sorry i noticed the #13 thanks to your answer

jijivski commented 9 months ago

Hi, can you still get content from the three websites? They refused my request. Do you have some ideas to solve the 403 and 401 problems? Thanks.

LuChang-CS commented 9 months ago

Hi, these errors may be caused by the anti-crawler feature of these news websites. You can try to set a longer time interval for two requests. But it cannot be guaranteed.