fix #1 but this will also alter hashtag counts, sometimes significantly, so existing results may not reproduce.
I'm wondering if the retweet merging code,
# Process Retweets:
if "referenced_tweets" in tweet:
rts = [t for t in tweet["referenced_tweets"] if t["type"] == "retweeted"]
retweeted_tweet = rts[-1] if rts else None
# If it's a native retweet, replace the "RT @user Text" with the original text, metrics, and entities, but keep the Author.
if retweeted_tweet:
# A retweet inherits everything from retweeted tweet.
tweet["text"] = retweeted_tweet.pop("text", tweet.pop("text", None))
tweet["entities"] = retweeted_tweet.pop("entities", tweet.pop("entities", None))
tweet["attachments"] = retweeted_tweet.pop("attachments", tweet.pop("attachments", None))
tweet["context_annotations"] = retweeted_tweet.pop(
"context_annotations", tweet.pop("context_annotations", None)
)
tweet["public_metrics"] = retweeted_tweet.pop("public_metrics", tweet.pop("public_metrics", None))
if entites exists in retweeted_tweet, tweet["entities"] is replaced. If it doesn't exist in retweeted_tweet, tweet.pop("entities", None) will set tweet["entities"] to whatever tweet["entities"] was before, or if it didn't exist, set it to None. I think this way should cover any situation - if the elements are there or not.
should be in expansions.py in twarc proper? since this will most likely be shared in other things too. We can refactor this later though.
fix #1 but this will also alter hashtag counts, sometimes significantly, so existing results may not reproduce.
I'm wondering if the retweet merging code,
The way it works with
pop
is,if
entites
exists inretweeted_tweet
,tweet["entities"]
is replaced. If it doesn't exist inretweeted_tweet
,tweet.pop("entities", None)
will settweet["entities"]
to whatevertweet["entities"]
was before, or if it didn't exist, set it toNone
. I think this way should cover any situation - if the elements are there or not.should be in
expansions.py
in twarc proper? since this will most likely be shared in other things too. We can refactor this later though.