DocNow / twarc-hashtags

Report on hashtags in tweet data.
MIT License
3 stars 1 forks source link

Change counting hashtags in retweets #2

Open igorbrigadir opened 2 years ago

igorbrigadir commented 2 years ago

fix #1 but this will also alter hashtag counts, sometimes significantly, so existing results may not reproduce.

I'm wondering if the retweet merging code,

                # Process Retweets:
                if "referenced_tweets" in tweet:
                    rts = [t for t in tweet["referenced_tweets"] if t["type"] == "retweeted"]
                    retweeted_tweet = rts[-1] if rts else None
                    # If it's a native retweet, replace the "RT @user Text" with the original text, metrics, and entities, but keep the Author.
                    if retweeted_tweet:
                        # A retweet inherits everything from retweeted tweet.
                        tweet["text"] = retweeted_tweet.pop("text", tweet.pop("text", None))
                        tweet["entities"] = retweeted_tweet.pop("entities", tweet.pop("entities", None))
                        tweet["attachments"] = retweeted_tweet.pop("attachments", tweet.pop("attachments", None))
                        tweet["context_annotations"] = retweeted_tweet.pop(
                            "context_annotations", tweet.pop("context_annotations", None)
                        )
                        tweet["public_metrics"] = retweeted_tweet.pop("public_metrics", tweet.pop("public_metrics", None))

The way it works with pop is,

tweet["entities"] = retweeted_tweet.pop("entities", tweet.pop("entities", None))

if entites exists in retweeted_tweet, tweet["entities"] is replaced. If it doesn't exist in retweeted_tweet, tweet.pop("entities", None) will set tweet["entities"] to whatever tweet["entities"] was before, or if it didn't exist, set it to None. I think this way should cover any situation - if the elements are there or not.

should be in expansions.py in twarc proper? since this will most likely be shared in other things too. We can refactor this later though.