JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.43k stars 706 forks source link

Twitter: retweetedTweet and quotedTweet #988

Closed jingwenshi-dev closed 1 year ago

jingwenshi-dev commented 1 year ago

Describe the bug

When I am testing the code and lib on my own Twitter account, I found that all retweets will be categorized into quotedTweet (i.e. the retweetedTweet is always empty).

How to reproduce

Just scrape a small account

Expected behaviour

Retweets will be under retweetedTweet column or attribute.

Screenshots and recordings

No response

Operating system

Windows 10

Python version: output of python3 --version

3.11

snscrape version: output of snscrape --version

0.7.0.20230622

Scraper

TwitterUserScraper

How are you using snscrape?

Module (import snscrape.modules.something in Python code)

Backtrace

No response

Log output

No response

Dump of locals

No response

Additional context

No response

JustAnotherArchivist commented 1 year ago

I'm not seeing this; retweets appear correctly under retweetedTweet here. Please provide a complete reproducible example.

jingwenshi-dev commented 1 year ago
import pandas as pd
from snscrape.modules.twitter import TwitterUserScraper

scraper = TwitterUserScraper('UofT')
result = []

for i, item in enumerate(scraper.get_items()):
    result.append(item)

pd.DataFrame(result).to_csv('UofT.csv', index=False)

Almost all the retweets in UofT's twitter account is a retweet but not quote. But if you use this code to scrape the tweets, the retweetedTweet column in the CSV file is empty and appears in quotedTweet column.

JustAnotherArchivist commented 1 year ago

The TwitterUserScraper uses the search and never returns retweets, only original and quote tweets. There used to be a filter to enable seeing retweets from the past 7 days, but this is no longer available since the API switch: #887

If you use the TwitterProfileScraper, you get retweets, correctly populated.

jingwenshi-dev commented 1 year ago

Like you said "it never returns retweets". But, it will still return retweets at my end and put the retweets in the quotedTweet column.

JustAnotherArchivist commented 1 year ago

Any examples of tweets returned like that? The first result with a non-empty retweetedTweet or quotedTweet I get is https://twitter.com/UofT/status/1648688150867243009, which is indeed a quote tweet.

jingwenshi-dev commented 1 year ago

Oh nvm, I was looking at the wrong dataset, sorry about that.