bocchilorenzo / ntscraper

Scrape from Twitter using Nitter instances
MIT License
168 stars 29 forks source link

Fetching failure again? #19

Open muenze8 opened 11 months ago

muenze8 commented 11 months ago

Hello,

Seems like the fetching issue is back again for some reason. Neither recent tweets nor old ones are being retrieved, while they are retrieved from Nitter page.

image
bocchilorenzo commented 11 months ago

I just tried but on my end it works (see pic). Are you on the latest version of the library? Many instances error out, but for example https://nitter.privacydev.net and https://n.populas.no work for now.

immagine

muenze8 commented 11 months ago

Yeah I updated the library just before posting

bocchilorenzo commented 11 months ago

I've added an instance check when launching the scraper. Now it takes a couple of seconds longer to start but it checks the instances that work in order to prevent many errors from happening. Let me know if it fixes the issue.

muenze8 commented 11 months ago

I've added an instance check when launching the scraper. Now it takes a couple of seconds longer to start but it checks the instances that work in order to prevent many errors from happening. Let me know if it fixes the issue.

There is a new error now on both Mac and Windows, as in below.

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

^CFatal Python error: init_import_site: Failed to import the site module Python runtime state: initialized Traceback (most recent call last): File "/Users/user/opt/anaconda3/lib/python3.9/site.py", line 73, in Traceback (most recent call last): File "", line 1, in import os File "/Users/user/opt/anaconda3/lib/python3.9/os.py", line 666, in Traceback (most recent call last): File "", line 1, in File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 288, in run_path exitcode = _main(fd, parent_sentinel) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare return _run_module_code(code, init_globals, run_name, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 97, in _run_module_code _fixup_main_from_path(data['init_main_from_path']) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path _run_code(code, mod_globals, init_globals, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code main_content = runpy.run_path(main_path, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 288, in run_path exec(code, run_globals) File "/Users/user/My Folder/Twitter Archive/NitterScrapper2.py", line 38, in nitter = Nitter(log_level=None) Fil return _run_module_code(code, init_globals, run_name, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code e "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 35, in init exec(code, run_globals) File "/Users/user/My Folder/Twitter Archive/NitterScrapper2.py", line 1, in self.instances = self._get_instances() File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 97, in _get_instances class _Environ(MutableMapping): File "/Users/user/opt/anaconda3/lib/python3.9/os.py", line 666, in _Environ import pandas as pd File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/pandas/init.py", line 11, in r = requests.get("https://github.com/zedeus/nitter/wiki/Instances") File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/requests/api.py", line 73, in get import(_dependency) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/init.py", line 223, in class _Environ(MutableMapping): KeyboardInterrupt return request("get", url, params=params, kwargs) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 587, in request core.getlimits._register_known_types() File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/core/getlimits.py", line 226, in _register_known_types resp = self.send(prep, send_kwargs) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 701, in send Traceback (most recent call last): File "", line 1, in File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main float128_ma = MachArLike(ld, File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/core/getlimits.py", line 60, in init self._str_smallest_normal = self._float_to_str(self.xmin) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/core/getlimits.py", line 119, in _float_to_str exitcode = _main(fd, parent_sentinel) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 125, in _main r = adapter.send(request, kwargs) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send return self.params['fmt'] % array(_fr0(value)[0], self.ftype) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 1586, in _array_str_implementation prepare(preparation_data) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path Traceback (most recent call last): File "", line 1, in File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main main_content = runpy.run_path(main_path, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 288, in run_path exitcode = _main(fd, parent_sentinel) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 125, in _main resp = conn.urlopen( File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen return _run_module_code(code, init_globals, run_name, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 97, in _run_module_code prepare(preparation_data) _run_code(code, mod_globals, init_globals, File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code return _guarded_repr_or_str(np.ndarray.getitem(a, ())) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 513, in wrapper exec(code, run_globals) File "/Users/user/My Folder/Twitter Archive/NitterScrapper2.py", line 38, in _fixup_main_from_path(data['init_main_from_path']) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path nitter = Nitter(log_level=None) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 35, in init self.instances = self._get_instances() File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 100, in _get_instances return f(self, *args, **kwargs) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 1568, in _guarded_repr_or_str main_content = runpy.run_path(main_path, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 288, in run_path httplib_response = self._make_request( File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in _make_request soup = BeautifulSoup(r.text, "lxml") File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/bs4/init.py", line 333, in init return _run_module_code(code, init_globals, run_name, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 97, in _run_module_code six.raise_from(e, None) File "", line 3, in raise_from File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in _make_request _run_code(code, mod_globals, init_globals, File "/Users/user/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code return str(v) KeyboardInterrupt httplib_response = conn.getresponse() File "/Users/user/opt/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse exec(code, run_globals) File "/Users/user/My Folder/Twitter Archive/NitterScrapper2.py", line 38, in self._feed() nitter = Nitter(log_level=None) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/bs4/init.py", line 451, in _feed File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 35, in init self.instances = self._get_instances() File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 100, in _get_instances self.builder.feed(self.markup) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/bs4/builder/_lxml.py", line 378, in feed soup = BeautifulSoup(r.text, "lxml") File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/bs4/init.py", line 333, in init self._feed() File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/bs4/init.py", line 451, in _feed Traceback (most recent call last): File "/Users/user/My Folder/Twitter Archive/NitterScrapper2.py", line 38, in nitter = Nitter(log_level=None) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 37, in init self._test_all_instances("/x", no_print=True) File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/ntscraper/nitter.py", line 155, in _test_all_instances p.map(self._test_instance, [(instance, endpoint) for instance in self.instances]) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 765, in get self.wait(timeout) File "/Users/user/opt/anaconda3/lib/python3.9/multiprocessing/pool.py", line 762, in wait self._event.wait(timeout) File "/Users/user/opt/anaconda3/lib/python3.9/threading.py", line 581, in wait signaled = self._cond.wait(timeout) File "/Users/user/opt/anaconda3/lib/python3.9/threading.py", line 312, in wait waiter.acquire() KeyboardInterrupt

(base) user@users-MacBook-Pro Twitter Archive % /Users/user/opt/anaconda3/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 9 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

bocchilorenzo commented 11 months ago

That error is because of the multiprocessing used during the instance check, which requires the scraper to be run in an "if name == 'main'" code block. I've removed multiprocessing, it's a bit slower to initialize but it works correctly without that code requirement.

muenze8 commented 11 months ago

That error is because of the multiprocessing used during the instance check, which requires the scraper to be run in an "if name == 'main'" code block. I've removed multiprocessing, it's a bit slower to initialize but it works correctly without that code requirement.

That fixed the error, thanks!

But it is wierd that still nothing gets fetched, I am not sure what is wrong exactly, because when I use the exact same search term on the instances webpages, tweets do get fetched.

24-Oct-23 15:01:22 - Empty profile on https://nitter.woodland.cafe. Trying https://nitter.mint.lgbt 24-Oct-23 15:01:25 - Empty profile on https://nitter.mint.lgbt. Trying https://nitter.catsarch.com 24-Oct-23 15:01:28 - Empty profile on https://nitter.catsarch.com. Trying https://nitter.dafriser.be 24-Oct-23 15:01:30 - Empty profile on https://nitter.dafriser.be. Trying https://nitter.uni-sonia.com 24-Oct-23 15:01:34 - Empty profile on https://nitter.uni-sonia.com. Trying https://n.populas.no 24-Oct-23 15:01:37 - Empty profile on https://n.populas.no. Trying https://nitter.woodland.cafe 24-Oct-23 15:01:39 - Empty profile on https://nitter.woodland.cafe. Trying https://nitter.tinfoil-hat.net 24-Oct-23 15:01:41 - Empty profile on https://nitter.tinfoil-hat.net. Trying https://nitter.privacydev.net 24-Oct-23 15:01:43 - Empty profile on https://nitter.privacydev.net. Trying https://nitter.ktachibana.party 24-Oct-23 15:01:47 - Empty profile on https://nitter.ktachibana.party. Trying https://nitter.dafriser.be 24-Oct-23 15:01:49 - Empty profile on https://nitter.dafriser.be. Trying https://nitter.perennialte.ch 24-Oct-23 15:01:52 - Empty profile on https://nitter.perennialte.ch. Trying https://nitter.woodland.cafe 24-Oct-23 15:01:54 - Empty profile on https://nitter.woodland.cafe. Trying https://nitter.privacydev.net 24-Oct-23 15:01:56 - Empty profile on https://nitter.privacydev.net. Trying https://nitter.perennialte.ch 24-Oct-23 15:01:59 - Empty profile on https://nitter.perennialte.ch. Trying https://nitter.catsarch.com 24-Oct-23 15:02:02 - Empty profile on https://nitter.catsarch.com. Trying https://nitter.mint.lgbt 24-Oct-23 15:02:05 - Empty profile on https://nitter.mint.lgbt. Trying https://nitter.dafriser.be 24-Oct-23 15:02:07 - Empty profile on https://nitter.dafriser.be. Trying https://nitter.d420.de 24-Oct-23 15:02:09 - Empty profile on https://nitter.d420.de. Trying https://nitter.dafriser.be 24-Oct-23 15:02:12 - Empty profile on https://nitter.dafriser.be. Trying https://nitter.mint.lgbt 24-Oct-23 15:02:14 - Max retries reached. Check your request and try again.

bocchilorenzo commented 11 months ago

That's strange, I'll try to investigate a bit and see if I can find some fix

muenze8 commented 10 months ago

That's strange, I'll try to investigate a bit and see if I can find some fix

any luck?

bocchilorenzo commented 10 months ago

I've tested with and without VPN to make sure it wasn't a network issue but was not able to replicate it. I'll keep it open for now.