Closed redfalcoon closed 1 year ago
i am facing the same error, what i can see is that twitter broke several operators in its search.
Hello
I am having the same issue. Nitter UI returns resutls for the queries I am passing in the code, but the ntscraper keeps showing "error fetching from instances" .... and yes, passing the same search query into twitter doesn't return anything too!!
Twitter operator OR & AND are broken, but i cant identify who this broke the scraper.
I found the error, incoming pull request @redfalcoon @muenze8
Patricio,
using your patch the user_information = scraper.get_profile_info(user) works fine, not all for user_tweets = scraper.get_tweets(user, mode='user', number=posts) that generate the same error.
Thanks
@redfalcoon paste your code to try replicate in my env.
Patricio
this is the source that I use:
import sys import json from ntscraper import Nitter scraper = Nitter(log_level=None)
server=sys.argv[1] user='elonmusk' posts=100 user_information = scraper.get_profile_info(user) print(json.dumps(user_information,indent=4)) user_tweets = scraper.get_tweets(user, mode='user', number=posts, instance=server) print(json.dumps(user_tweets,indent=4))
To execute the code I use:
python3 CrawlerNitter.py https://nitter.net
Traceback (most recent call last):
File "CrawlerNitter.py", line 9, in
i got the same result with user='elonmusk' then i try user='jack' and success.
{ "image": "https://pbs.twimg.com/profile_images/1661201415899951105/azNjKOSH_400x400.jpg", "name": "jack", "username": "@jack", "id": "12", "bio": "#bitcoin and chill.\n\n#nostr: npub1sg6plzptd64u62a878hep2kev88swjh3tw00gjsfl8f237lmu63q0uf63m", "location": "", "website": "https://primal.net/jack", "joined": "8:50 PM - 21 Mar 2006", "stats": { "tweets": 29315, "following": 4658, "followers": 6523202, "likes": 35955, "media": 2917 } } 27-Jul-23 10:49:22 - Empty profile on http://localhost, trying another random instance 27-Jul-23 10:49:23 - Error fetching https://nitter.onthescent.xyz/, trying another random instance 27-Jul-23 10:49:25 - Error fetching https://twitter.owacon.moe, trying another random instance { "tweets": [ { "link": "https://twitter.com/sza/status/1684461977647845377#m", "text": "DND: a lifestyle", "user": { "name": "SZA", "username": "@sza", "avatar": "https://pbs.twimg.com/profile_images/1600953582605328384/t5skIcVh_bigger.jpg" }, "date": "Jul 27, 2023 \u00b7 7:13 AM UTC", "is-retweet": true, "external-link": "", "quoted-post": {}, "stats": { "comments": 0, "retweets": 20415, "quotes": 0, "likes": 42522 }, "pictures": [], "videos": [], "gifs": [] }, { "link": "https://twitter.com/lilyallen/status/1684338954550616069#m", "text": "The world really did a number on Sinead O\u2019Oconnor. RIP Angel.", "user": { "name": "Lily Allen", "username": "@lilyallen", "avatar": "https://pbs.twimg.com/profile_images/1562055589584404483/C5v4nJvC_bigger.jpg" }, "date": "Jul 26, 2023 \u00b7 11:04 PM UTC", "is-retweet": true, "external-link": "", "quoted-post": {}, "stats": { "comments": 158, "retweets": 623, "quotes": 0, "likes": 17847 }, "pictures": [], "videos": [], "gifs": [] }, { "link": "https://twitter.com/jack/status/1684455493333557251#m", "text": "https://piped.video/watch?v=DOiDUbaBL9E", "user": { "name": "jack", "username": "@jack", "avatar": "https://pbs.twimg.com/profile_images/1661201415899951105/azNjKOSH_bigger.jpg" }, "date": "Jul 27, 2023 \u00b7 6:47 AM UTC", "is-retweet": false, "external-link": "", "quoted-post": {}, "stats": { "comments": 90, "retweets": 179, "quotes": 0, "likes": 880 }, "pictures": [], "videos": [], "gifs": [] },etc....
Patricio,
I've changed the user in jack and the result change a little bit, this is the output
python3 CrawlerNitter.py https://nitter.net
27-Jul-23 16:59:05 - Error fetching https://nitter.unixfox.eu, trying another random instance
27-Jul-23 16:59:07 - Error fetching https://nitter.tokhmi.xyz, trying another random instance
{
"image": "https://pbs.twimg.com/profile_images/1661201415899951105/azNjKOSH_400x400.jpg",
"name": "jack",
"username": "@jack",
"id": "12",
"bio": "#bitcoin and chill.\n\n#nostr: npub1sg6plzptd64u62a878hep2kev88swjh3tw00gjsfl8f237lmu63q0uf63m",
"location": "",
"website": "https://primal.net/jack",
"joined": "8:50 PM - 21 Mar 2006",
"stats": {
"tweets": 29315,
"following": 4658,
"followers": 6523202,
"likes": 35955,
"media": 2917
}
}
27-Jul-23 16:59:15 - Error fetching https://nitter.net, trying another random instance
Traceback (most recent call last):
File "CrawlerNitter.py", line 11, in
I was able to read the info related to the profile not the posts, I've changed the nitter.py to print the instance used and I've verified that not all the instances selected randomically produce right results as you see in the previous log of this text. Using this patch (only write the instance used) I've got the following result: python3 CrawlerNitter.py https://nitter.net 27-Jul-23 17:06:02 - https://ntr.frail.duckdns.org unreachable, trying another random instance 27-Jul-23 17:06:11 - https://nitter.inpt.fr unreachable, trying another random instance https://nitter.riverside.rocks https://nitter.riverside.rocks { "image": "https://pbs.twimg.com/profile_images/1661201415899951105/azNjKOSH_400x400.jpg", "name": "jack", "username": "@jack", "id": "12", "bio": "#bitcoin and chill.\n\n#nostr: npub1sg6plzptd64u62a878hep2kev88swjh3tw00gjsfl8f237lmu63q0uf63m", "location": "", "website": "https://primal.net/jack", "joined": "8:50 PM - 21 Mar 2006", "stats": { "tweets": 29315, "following": 4658, "followers": 6523212, "likes": 35955, "media": 2917 } } https://nitter.net 27-Jul-23 17:06:17 - Error fetching https://nitter.net, trying another random instance https://nitter.freedit.eu https://nitter.freedit.eu https://nitter.freedit.eu https://nitter.freedit.eu https://nitter.freedit.eu https://nitter.freedit.eu 27-Jul-23 17:06:34 - Empty profile on https://nitter.freedit.eu, trying another random instance https://nitter.tokhmi.xyz 27-Jul-23 17:06:36 - Error fetching https://nitter.tokhmi.xyz, trying another random instance https://nitter.weiler.rocks { "tweets": [ { "link": "https://twitter.com/sza/status/1684461977647845377#m", "text": "DND: a lifestyle", "user": { "name": "SZA", "username": "@sza", "avatar": "https://pbs.twimg.com/profile_images/1600953582605328384/t5skIcVh_bigger.jpg" }, "date": "Jul 27, 2023 \u00b7 7:13 AM UTC", "is-retweet": true, "external-link": "", "quoted-post": {}, "stats": { "comments": 0, "retweets": 21292, "quotes": 0, "likes": 44352 }, "pictures": [], "videos": [], "gifs": []
It seems that the nitter instances responds in different ways and create misinterpretations.https://nitter.riverside.rocks Both instances that gave the right behaviour are updated to the Version [2023.07.24-20b5cce] (nitter.net has the same version).
it could be something related to the instance and not to the scraper
Thanks @psegovias , just pulled your request. I also think there are some issues with some instances. For example, the nitter,tokhmi.xyz instance is having issues right now as seen here:
Closing the issue for now.
Hi Lorenzo and Patricio, I've done a small patch to the code and now the process end without error, in the code I've changed the following block (lines 113-121)
if soup.find_all("div", class_="show-more")[-1].find("a").text == "Load newest":
keep_trying = False
soup = None
else:
logging.warning(
f"Empty profile on {instance}, trying another random instance"
)
instance = self.get_random_instance()
count += 1
with: try: if soup.findall("div", class="show-more")[-1].find("a").text == "Load Newest": keep_trying = False soup = None else: logging.warning( f"Empty profile on {instance}, trying another random instance" ) instance = self.get_random_instance() count += 1 except: continue and now the scraper do not generate error. It's a workround not really optimized, I believe that in the except section can be managed the random selection of the instance but that is and seems that works well and solve my issue. Thank you for your great work.
Hi!
starting from today the scraping doesn't work correctly returning the error messages: Traceback (most recent call last): File "CrawlerNitter.py", line 7, in
user_information = scraper.get_profile_info(user)
File "/usr/local/lib/python3.6/site-packages/ntscraper/nitter.py", line 587, in get_profile_info
is_encrypted = self.is_instance_encrypted(instance)
File "/usr/local/lib/python3.6/site-packages/ntscraper/nitter.py", line 43, in is_instance_encrypted
instance_new, soup = self.__get_page("/Twitter", instance)
File "/usr/local/lib/python3.6/site-packages/ntscraper/nitter.py", line 106, in __get_page
if soup.findall("div", class="show-more")[-1].find("a").text == "Load newest":
IndexError: list index out of range
this happens using: user_information = scraper.get_profile_info(user) but also using: user_tweets = scraper.get_tweets(user, mode='user', number=posts)
until yesterday the process of crawling worked fine. The instance used for the test was the https://nitter.net (the most recent updated) with Version [2023.07.24-20b5cce]
Any suggestion?
Thanks