dodiku / news_scraper

A web scraper that gets news articles about a list of companies from a list of websites
1 stars 1 forks source link

Can't run app #1

Open divingdog opened 4 years ago

divingdog commented 4 years ago

Dear Dror Ayalon!

First of all thank You for the app!

I am trying to make it work, but sadly not much success so far. After downloading the original files it shows this error after following Your description on usage:

File "scraper.py", line 89 SyntaxError: Non-ASCII character '\xf0' in file scraper.py on line 89, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

If started with python3 sometimes it works, but if I add new domains to the list it seems to crash on some.

/Downloads/news_scraper-master$ python3 scraper.py Traceback (most recent call last): File "scraper.py", line 42, in web_content = requests.get(website) File "/home/trinidad/.local/lib/python3.6/site-packages/requests/api.py", line 70, in get return request('get', url, params=params, kwargs) File "/home/trinidad/.local/lib/python3.6/site-packages/requests/api.py", line 56, in request return session.request(method=method, url=url, kwargs) File "/home/trinidad/.local/lib/python3.6/site-packages/requests/sessions.py", line 474, in request prep = self.prepare_request(req) File "/home/trinidad/.local/lib/python3.6/site-packages/requests/sessions.py", line 407, in prepare_request hooks=merge_hooks(request.hooks, self.hooks), File "/home/trinidad/.local/lib/python3.6/site-packages/requests/models.py", line 302, in prepare self.prepare_url(url, params) File "/home/trinidad/.local/lib/python3.6/site-packages/requests/models.py", line 382, in prepare_url raise MissingSchema(error) requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

If I'm not specifying python3

python scraper.py Traceback (most recent call last): File "scraper.py", line 45, in web_content = requests.get(website) File "/home/trinidad/Downloads/news_scraper-master/env/lib/python2.7/site-packages/requests/api.py", line 70, in get return request('get', url, params=params, kwargs) File "/home/trinidad/Downloads/news_scraper-master/env/lib/python2.7/site-packages/requests/api.py", line 56, in request return session.request(method=method, url=url, kwargs) File "/home/trinidad/Downloads/news_scraper-master/env/lib/python2.7/site-packages/requests/sessions.py", line 474, in request prep = self.prepare_request(req) File "/home/trinidad/Downloads/news_scraper-master/env/lib/python2.7/site-packages/requests/sessions.py", line 407, in prepare_request hooks=merge_hooks(request.hooks, self.hooks), File "/home/trinidad/Downloads/news_scraper-master/env/lib/python2.7/site-packages/requests/models.py", line 302, in prepare self.prepare_url(url, params) File "/home/trinidad/Downloads/news_scraper-master/env/lib/python2.7/site-packages/requests/models.py", line 382, in prepare_url raise MissingSchema(error) requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

Another error message on python3

Traceback (most recent call last): File "scraper.py", line 57, in 'url': link.get('href').split('?')[0], AttributeError: 'NoneType' object has no attribute 'split'

Could You please let me know if I'm missing something?

Thank You! Greg

dodiku commented 4 years ago

This sounds like a Windows compatibility issue. This project was tested only on a Mac.