fhamborg / news-please

news-please - an integrated web crawler and information extractor for news that just works
Apache License 2.0
2.08k stars 428 forks source link

HTTPError: HTTP Version not supported #89

Closed gursky1 closed 5 years ago

gursky1 commented 5 years ago

I am trying to use the commoncrawl.py script to do pull news articles from Common Crawl, but am receiving a very persistent HTTPError. I made sure I have aws-cli installed via pip (I am using Conda), and that the version is the latest (1.16 I believe, though I don't remember). I am also using the latest version of Newsplease as available on pip. This seems to have all the hallmarks of an issue with aws-cli

I am using Windows with Anaconda and Python 3.6

The error log is attached.

Any help would be much appreciated. Errorlog.txt

fhamborg commented 5 years ago

Seems that awk is not corrently installed or setup, cf. this line from your error log

'awk' is not recognized as an internal or external command