carpentries-incubator / twitter-with-twarc

Introduction to Harvesting Twitter Data with Twarc
https://carpentries-incubator.github.io/twitter-with-twarc/
Other
4 stars 4 forks source link

ep 4 solutions to challenges #24

Closed jonjab closed 2 years ago

jonjab commented 2 years ago

Insert the answers and details on how to get the answers for the 2 challenges in episode 4:

https://ucsbcarpentry.github.io/twitter-with-twarc/04-twitter-api/index.html

today's python notebook version number is 6.

Jinxiang2000 commented 2 years ago

Challenge 1: Try counting other things on Twitter

!twarc2 counts --granularity "day" --text "(Poker OR poker OR #Poker OR #poker)" 
!twarc2 counts --granularity "day" --text "(Golf OR golf OR #Golf OR #golf)" 
!twarc2 counts --granularity "day" --text "(Basketball OR basketball OR #Basketball OR #basketball)" 
!twarc2 counts --granularity "day" --text "(Baseball OR baseball OR #Baseball OR #baseball)" 
!twarc2 counts --granularity "day" --text "(Football OR football OR #Football OR #football)" 

``

Jinxiang2000 commented 2 years ago

Challenge 2:Cats of Instagram

Question 1 Did you get exactly 5000?

!twarc2 search --limit 5000 "#catofinstagram" raw-data/catofinstagram.jsonl

Jinxiang2000 commented 2 years ago

Question 2 How far back in time did you get?

import pandas 
!twarc2 csv raw-data/catofinstagram.jsonl > output-data/catofinstagram.csv
cat_df = pandas.read_csv("output-data/catofinstagram.csv") 
list(cat_df.columns) #list the column name of cat_df 
print(cat_df['created_at'].head()) # Start time 
print(cat_df['created_at'].tail())# End time 
Jinxiang2000 commented 2 years ago

Question 3 What is the most re-tweeted recent tweet on #catsofinstagram?

cat_df['public_metrics.retweet_count'].max() #29 
cat_df[cat_df['public_metrics.retweet_count'] == cat_df['public_metrics.retweet_count'].max()].head()

The most re-tweeted recent tweet on #catsofinstagram was created at 2022-05-16 T 02:58:30, the conversation id is 1526034327246475264

Jinxiang2000 commented 2 years ago

Question 4 Which person has the most number of followers in your dataset?

cat_df['author.public_metrics.followers_count'].max() #14574 followers
most_follower = cat_df[cat_df['author.public_metrics.followers_count'] == cat_df['author.public_metrics.followers_count'].max()].head()

User with author_id 248757990 has the most followers, which is 14574.

Jinxiang2000 commented 2 years ago

Question 5 - Is it really a person?

!twarc2 user id 248757990

ameliameyer commented 2 years ago

I added the answers to the challenge to the episode