dgnsrekt / nitter_scraper

Scrape Twitter API without authentication using Nitter.
https://nitter-scraper.readthedocs.io/
MIT License
62 stars 13 forks source link

Docker installed, but "no module named docker" #13

Open ewebgh33 opened 9 months ago

ewebgh33 commented 9 months ago

Hi Tried to set this up today on Win11. So far everything seems to have installed. I already have Docker Desktop.

However I'm hitting a wall trying to run a simple scrape (the example from github on the main page). Running the .py, I get

Traceback (most recent call last):
  File "c:\Scraping\nitter_scraper\test.py", line 3, in <module>
    from nitter_scraper import NitterScraper
  File "c:\Scraping\nitter_scraper\nitter_scraper\__init__.py", line 1, in <module>
    from nitter_scraper.nitter import NitterScraper
  File "c:\Scraping\nitter_scraper\nitter_scraper\nitter.py", line 8, in <module>
    import docker
ModuleNotFoundError: No module named 'docker'

Which is weird because I've literally just run pip install docker, as well as already having desktop.

Any tips here appreciated. Version mismatch? I ran into an earlier issue where I couldn't pip install nitter_scraper because it didn't like python=3.11. Am now running 3.8. But perhaps it also wants an older docker?

pip list shows:

appdirs             1.4.4
beautifulsoup4      4.12.3
bs4                 0.0.2
certifi             2023.11.17
charset-normalizer  3.3.2
colorama            0.4.6
cssselect           1.2.0
docker              4.4.4
fake-useragent      1.4.0
idna                3.6
importlib-metadata  7.0.1
importlib-resources 6.1.1
Jinja2              2.11.3
loguru              0.5.3
lxml                5.1.0
MarkupSafe          2.1.3
nitter-scraper      0.5.0
parse               1.20.0
pendulum            2.1.2
pip                 23.3.1
pydantic            1.10.13
pyee                8.2.2
pyppeteer           1.0.2
pyquery             2.0.0
python-dateutil     2.8.2
pytzdata            2020.1
pywin32             227
requests            2.31.0
requests-html       0.10.0
setuptools          68.2.2
six                 1.16.0
soupsieve           2.5
tqdm                4.66.1
typing_extensions   4.9.0
urllib3             1.26.18
w3lib               2.1.2
websocket-client    1.7.0
websockets          10.4
wheel               0.41.2
win32-setctime      1.1.0
zipp                3.17.0
ewebgh33 commented 9 months ago

OK I can see now that it's different version to that listed in pyproject.toml.

Then again, instructions on main page of this repo didn't say anything about needing a poetry install. So.

pip install poetry
poetry install

a couple of errors here

Cannot install pytest-sugar.
  • Installing pytest-watch (4.2.0): Failed
  FileNotFoundError
  [Errno 2] No such file or directory: 'C:\\Users\\HESPEROS\\miniconda3\\envs\\nitterscraper\\lib\\site-packages\\virtualenv\\activation\\nushell\\activate.nu'
  at <frozen importlib._bootstrap_external>:1048 in open_resource
Cannot install pytest-watch.

Still can't find docker. So maybe I need to start over and set up the env with 3.7, not 3.8? And then pip install docker, pip install poetry, poetry install.... ?

I don't know what am I missing here.

ewebgh33 commented 9 months ago

Progress Deleted the env, and re-created it with python=3.7. Had to manually run pip install markupsafe==2.0.1 because that was the next error I came up against.

Now a new error. It looked like it started to spin up docker

2024-01-19 19:27:11.878 | INFO     | nitter_scraper.nitter:_get_client:31 - Docker connection successful.
2024-01-19 19:27:23.199 | INFO     | nitter_scraper.nitter:start:155 - Running container inspiring_moore c5edb4e9e2.
2024-01-19 19:27:23.203 | INFO     | nitter_scraper.nitter:stop:159 - Stopping container inspiring_moore c5edb4e9e2.
2024-01-19 19:27:28.597 | INFO     | nitter_scraper.nitter:stop:162 - Container inspiring_moore c5edb4e9e2 Destroyed.

Now I get these errors

Traceback (most recent call last):
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connection.py", line 175, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
OSError: [WinError 10049] The requested address is not valid in its context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connectionpool.py", line 722, in urlopen
    chunked=chunked,
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\http\client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\http\client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\http\client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\http\client.py", line 1036, in _send_output
    self.send(msg)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\http\client.py", line 976, in send
    self.connect()
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connection.py", line 205, in connect
    conn = self._new_conn()
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connection.py", line 187, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x00000201BC7A7788>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\requests\adapters.py", line 497, in send
    chunked=chunked,
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\connectionpool.py", line 800, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\urllib3\util\retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=8008): Max retries exceeded with url: /dgnsrekt (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000201BC7A7788>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    profile = nitter.get_profile("dgnsrekt")
  File "C:\Scraping\nitter_scraper\nitter_scraper\nitter.py", line 114, in get_profile
    return get_profile(username=username, not_found_ok=not_found_ok, address=self.address)
  File "C:\Scraping\nitter_scraper\nitter_scraper\profile.py", line 202, in get_profile
    response = session.get(url)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\requests\sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\COMPUTERFACE\miniconda3\envs\nitterscraper\lib\site-packages\requests\adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8008): Max retries exceeded with url: /dgnsrekt (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000201BC7A7788>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context'))

So my conclusion is this thing is just old and doesn't work any more. Is this true? Does anyone have this running, like today?

estebanlm commented 9 months ago

I got to the same place of you with a slightly different error:

2024-01-21 15:11:01.821 | INFO     | nitter_scraper.nitter:_get_client:31 - Docker connection successful.
2024-01-21 15:11:03.146 | INFO     | nitter_scraper.nitter:start:155 - Running container inspiring_shirley 4b5453b0c4.
2024-01-21 15:11:03.377 | INFO     | nitter_scraper.nitter:stop:159 - Stopping container inspiring_shirley 4b5453b0c4.
2024-01-21 15:11:08.661 | INFO     | nitter_scraper.nitter:stop:162 - Container inspiring_shirley 4b5453b0c4 Destroyed.
Traceback (most recent call last):
  File "/home/esteban/dev/python/nitter/test.py", line 6, in <module>
    profile = nitter.get_profile("dgnsrekt")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/esteban/dev/python/nitter/venv/lib/python3.11/site-packages/nitter_scraper/nitter.py", line 114, in get_profile
    return get_profile(username=username, not_found_ok=not_found_ok, address=self.address)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/esteban/dev/python/nitter/venv/lib/python3.11/site-packages/nitter_scraper/profile.py", line 207, in get_profile
    raise ValueError(f'Oops! Either "{username}" does not exist or is private.')
ValueError: Oops! Either "dgnsrekt" does not exist or is private.

do you make some progress ?

ewebgh33 commented 9 months ago

@estebanlm I gave up on Nitter as I found a few places around the web to get sets of free monthly API calls. Search google for API sites and see a) what gets you a free account (not all will offer that) and b) what services they connect to. Some will do twitter and give a few hundred API calls a month. This is more than enough for my purposes, and I don't need to muck around with a Nitter instance or docker any more - I just call with Python and Requests

estebanlm commented 9 months ago

:( sadly I can not do that, I will continue digging and see if I can make a PR (highly unlikely, I am very new to python ;)