bocchilorenzo / ntscraper

Scrape from Twitter using Nitter instances
MIT License
178 stars 29 forks source link

TypeError in _get_tweet_link Method When Accessing Non-Existent Elements #42

Closed Jorgarias closed 11 months ago

Jorgarias commented 11 months ago

Hello,

I am encountering a recurring issue with the ntscraper library, specifically in the _get_tweet_link function. The error that arises is a TypeError: 'NoneType' object is not subscriptable, which occurs when the library attempts to access the ["href"] attribute of an element that is not found (i.e., returns None).

Error Description: The error occurs in the line return "https://twitter.com" + tweet.find("a")["href"]. This line of code assumes that the find("a") method will always return an element, but in some cases, it returns None, leading to the TypeError when the code attempts to subscript this None value.

During the scraping process, the error occurs intermittently, particularly when a tweet does not contain the expected anchor (<a>) element.

Thank you for your time and effort in maintaining this library.

CODE import pandas as pd from ntscraper import Nitter

scraper = Nitter()

def get_tweets_safe(name, modes, start_date, end_date): try: return get_tweets(name, modes, start_date, end_date) except TypeError: print(f"Error al procesar los tweets para: {name}") return pd.DataFrame(columns=['link', 'text', 'date', 'No_of_Likes', 'No_of_tweets'])

Uso de la función con múltiples términos

start_date = '2023-06-01' end_date = '2023-07-23' terms = ["perro sanxe", "perro sanchez", "perro sanche"] data = get_all_tweets(terms, 'term', start_date, end_date)