dgnsrekt / nitter_scraper

Scrape Twitter API without authentication using Nitter.
https://nitter-scraper.readthedocs.io/
MIT License
61 stars 13 forks source link

Allow downloading tweets by hashtag or cashtag #3

Open flxai opened 3 years ago

flxai commented 3 years ago

This branch adds the ability to download tweets not only for a profile, but also for hashtags or cashtags.

Changes were made to the functions get_tweets and pagination_parser in nitter_scraper/tweets.py and get_tweets in nitter_scraper/nitter.py. Please tell me if you're okay with the implementation or have suggestions for improvement.

Example usage for hashtags (leading #):

import nitter_scraper
from nitter_scraper import NitterScraper

hashtags = ["ToTheMoon"]

print("Scraping with local nitter docker instance.")

with NitterScraper(host="0.0.0.0", port=8008) as nitter:
    for hashtag in hashtags:
        tweets = nitter.get_tweets(hashtag, query_type='hashtag', pages=2)
        for tweet in tweets:
            print()
            pprint(tweet.dict())
            print(tweet.json(indent=4))

Example for cashtags (leading $):

import nitter_scraper
from nitter_scraper import NitterScraper

cashtags = ["USDT"]

print("Scraping with local nitter docker instance.")

with NitterScraper(host="0.0.0.0", port=8008) as nitter:
    for cashtag in cashtags:
        tweets = nitter.get_tweets(cashtag, query_type='cashtag', pages=2)
        for tweet in tweets:
            print()
            pprint(tweet.dict())
            print(tweet.json(indent=4))
flxai commented 3 years ago

Do you think it might be better to drop the parameter query_type and make it implicit? So query_strings that start with a # are implicitly hashtags, so are cashtags with a $ at the beginning and everything else must be a user's account?

flxai commented 3 years ago

Made it implicit now. Think this to be a more intuitive user experience. It works like before now, but allows for #hashtag or $cashtag use like so:

import nitter_scraper
from nitter_scraper import NitterScraper
from pprint import pprint

queries = ["dgnsrekt", "#ToTheMoon", "$USDT"]

print("Scraping with local nitter docker instance.")

with NitterScraper(host="0.0.0.0", port=8008) as nitter:
    for query in queries:
        print('=' * 80, '\n', query, '\n', '=' * 80)
        tweets = nitter.get_tweets(query, pages=1)
        for tweet in tweets:
            print('-' * 80)
            pprint(tweet.dict())
            print(tweet.json(indent=4))

Or with an arguable bit more readibility borrowing colored output:

import nitter_scraper
from nitter_scraper import NitterScraper
from pprint import pformat
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import TerminalFormatter

def pprint_color(obj):
    print(highlight(pformat(obj), PythonLexer(), TerminalFormatter()))

queries = ["dgnsrekt", "#ToTheMoon", "$USDT"]

print("Scraping with local nitter docker instance.")

with NitterScraper(host="0.0.0.0", port=8008) as nitter:
    for query in queries:
        print('=' * 80, '\n', query, '\n', '=' * 80)
        tweets = nitter.get_tweets(query, pages=1)
        for tweet in tweets:
            print('-' * 80)
            pprint_color(tweet.dict())