bocchilorenzo / ntscraper

Scrape from Twitter using Nitter instances
MIT License
168 stars 29 forks source link

instance='http://localhost' not working #39

Closed NetworkWebMasters closed 9 months ago

NetworkWebMasters commented 9 months ago

I installed Nitter on ubuntu and http://localhost working perfectly, i can access and read twitter profile and tweets from localhost.

But instance='http://localhost' like this not working. scraper = Nitter(log_level=1, skip_instance_check=True) bezos_tweets = scraper.get_tweets("JeffBezos", mode='user', instance='http://localhost')

Can you help me, How can i scrap tweets from http://localhost ?

bocchilorenzo commented 9 months ago

Hello, have you tried specifiying also the port on your localhost? Like so: http://localhost:8080

NetworkWebMasters commented 9 months ago

I tried this http://localhost:8080/ nitter.conf configuration "port = 8080"

I also tried this http://localhost/ nitter.conf configuration "port = 80"

But these configurations still not working.

NetworkWebMasters commented 9 months ago

Please check this error

CTraceback (most recent call last): File "/home/ubuntu/Downloads/final.py", line 14, in exportData = scraper.get_tweets(username, mode='user', since=sinceDate, until=untilDate, instance='http://localhost') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.local/lib/python3.11/site-packages/ntscraper/nitter.py", line 853, in get_tweets return self._search( ^^^^^^^^^^^^^ File "/home/ubuntu/.local/lib/python3.11/site-packages/ntscraper/nitter.py", line 733, in _search soup = self._get_page(endpoint, max_retries, no_empty_retries) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.local/lib/python3.11/site-packages/ntscraper/nitter.py", line 278, in _get_page sleep(2)

04-Dec-23 15:10:16 - Max retries reached. Check your request and try again.

NetworkWebMasters commented 9 months ago

Latest Nitter works perfectly in browser on http://localhost

but when i run any basic ntscraper python code localhost console shows these error message

ubuntu@ubuntu:/opt/nitter$ sudo ./nitter Starting Nitter at http://localhost Connected to Redis at localhost:6379 [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request... [accounts] Rate limited, retrying search request...

NetworkWebMasters commented 9 months ago

I followed this Nitter Installation Guide https://github.com/zedeus/nitter/wiki/Guest-Account-Branch-Deployment

bocchilorenzo commented 9 months ago

That's weird because I don't have any special code for localhost instances. The fact that your Nitter installation also tells you it's rate limited makes me think there is a problem with the configuration. Try checking out the comments at https://github.com/zedeus/nitter/issues/983 and see if they can help. Also, when you had the max retries error you were passing the instance incorrectly. I'm not sure if you tried this already, but try launching the scraper like this: exportData = scraper.get_tweets(username, mode='user', since=sinceDate, until=untilDate, instance='http://localhost:8080')

When passing the instance, you should omit the "/" at the end and if you use it locally you should also specify the port in the url.

NetworkWebMasters commented 9 months ago

I tried this https://github.com/zedeus/nitter/issues/983 I created new guest account tokens multiple times.

Actually Localhost instance works perfectly

The only issues is ntscraper wont detect it.

I also tried this exportData = scraper.get_tweets(username, mode='user', since=sinceDate, until=untilDate, instance='http://localhost:8080')

NetworkWebMasters commented 9 months ago

Can you check this error?

CTraceback (most recent call last): File "/home/ubuntu/Downloads/final.py", line 14, in exportData = scraper.get_tweets(username, mode='user', since=sinceDate, until=untilDate, instance='http://localhost/') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.local/lib/python3.11/site-packages/ntscraper/nitter.py", line 853, in get_tweets return self._search( ^^^^^^^^^^^^^ File "/home/ubuntu/.local/lib/python3.11/site-packages/ntscraper/nitter.py", line 733, in _search soup = self._get_page(endpoint, max_retries, no_empty_retries) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.local/lib/python3.11/site-packages/ntscraper/nitter.py", line 278, in _get_page sleep(2)

bocchilorenzo commented 9 months ago

Regarding the latest error, there was an issue with the endpoint creation that should now be fixed. Please try again with the latest version.

NetworkWebMasters commented 9 months ago

Thank you so much instance='http://localhost/' works perfectly now.

But there is one more issue

When i scrap tweets from 2023-12-10 to 2023-11-01

1) ntscraper only collect current December month's Retweets Quotes and Likes Counts

2) It wont collect November month's Retweets Quotes and Likes Counts, It only collect tweets counts of November month.

3) Also it miss some November dates

4) and it also wont scrap todays all data 2023-12-10, it shows data from 2023-12-09 to 2023-11-01.

bocchilorenzo commented 9 months ago

The value in the "from" parameter should be the first date, while the one in the "to" attribute should be the last date. Try swapping them to see if it gets the tweets correctly.

NetworkWebMasters commented 9 months ago

I tried swapping them to see if it gets the tweets correctly.

i tried like this sinceDate = 2023-11-01 untilDate = 2023-12-10

but i found same issue.