DocNow / twarc-csv

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
MIT License
31 stars 10 forks source link

Missing retweet (quote) text #10

Closed timothyjgraham closed 3 years ago

timothyjgraham commented 3 years ago

Hi there,

Is there an easy way to get the quoted tweet text field in the CSV output? The original tweet text is available but not the quoted text, which is very often where the hashtags of interest (used in the original search query) are located.

Thanks!

igorbrigadir commented 3 years ago

This should already be in there - quoted tweets are included as rows just before the quote tweet.

They will be missing if the original data didn't have them - like if it's a quote of a quote for example. This depends on your data - do you have an example of a missing one?

edsu commented 3 years ago

I don't know if this is relevant but I've definitely noticed error messages when quoted tweets can't be expanded because they were deleted or protected.

https://gist.github.com/edsu/b0efc3b4b4281aee794a4a6869065584

An example would be great if you can isolate one @timothyjgraham

igorbrigadir commented 3 years ago

I think i got to the bottom of it:

If it’s a Retweet of a Quote tweet, like for example 1388227620681175041, the original quoted tweet, 1388202672961036293 will not show up in includes - only the retweeted tweet 1334683357444251649 will, with the referenced_tweets pointing to the original quoted tweet.

referenced_tweets will only go down to a "depth" of 1. So to get the original quoted tweet, from a retweet of a quote tweet, you would have to make another request, and then process that and make another to get any other chains like this - possibly for 100s of tweets as quote tweet chains can span out.

Since i changed a bunch of stuff, i highly recommend updating to the latest version and trying the conversion again:

pip install --upgrade twarc twarc-csv
igorbrigadir commented 3 years ago

Please reopen if it's still an issue with the new v0.2.0 version for twarc-csv

timothyjgraham commented 3 years ago

@igorbrigadir Many thanks for looking into this and apologies for the late reply. I will test it and let you know! Thanks again!

igorbrigadir commented 3 years ago

Thanks! There was another update recently, so definitely run pip install --upgrade twarc twarc-csv to update and extract the csv again with twarc2 csv ..., since there were changes to how things are handled there too.