Open shaanchandra opened 4 years ago
I found this script helpful:
import pandas as pd
fake = pd.read_csv("https://raw.githubusercontent.com/cuilimeng/CoAID/master/05-01-2020/ClaimFakeCOVID-19_tweets.csv")
real = pd.read_csv("https://raw.githubusercontent.com/cuilimeng/CoAID/master/05-01-2020/ClaimRealCOVID-19_tweets.csv")
fake["label"] = "fake"
real["label"] = "real"
df = pd.concat([fake, real])
df["text"] = "None"
import requests
from bs4 import BeautifulSoup
for i, row in df.iterrows():
id = row.tweet_id
url = "https://mobile.twitter.com/Richx183/status/" + str(id)
body = requests.get(url)
body = BeautifulSoup(body.content, 'html.parser')
for el in body.find_all("div", attrs={"data-id":str(id)}):
text = ""
for x in el.div.contents:
x = str(x)
if "class=" not in x:
text += x
text = text.strip()
df.at[i, "text"] = text
df = df.drop(df[df.text == "None"].index) #drop unnsuccessful queries
df.head()
Hi, Thank you for this important and timely dataset.
I wanted to request you to share the scripts to download/crawl the relevant user information from twitter (such as the tweets/retweets, follower/following, etc).
I know you can not distribute that data, but you can provide scripts of how to crawl that ourselves I guess.