Macemann / Georgetown-Capstone

Georgetown Data Analytics Capstone Project
MIT License
3 stars 4 forks source link

Assistance regarding dataset needed #1

Open wei-ann-Github opened 8 years ago

wei-ann-Github commented 8 years ago

Hi Macemann,

I am a Business Analytics Masters student from Singapore.

I have a social analytics group project and my team is interested in using your dataset from the ISIS project. I would like to ask for your permission to use this dataset. I have seen the report on this capstone project and realized that 1 dataset is missing from this repository (retweet collection). Without this dataset, the analysis of social network and interest will be incomplete.

Await you kind reply, Thank you :)

Sincerely, Wei Ann.

mtphilli commented 8 years ago

Hello Wei Ann, I think that data set has been lost. However, it is derived from the others and there are a couple of scripts to recreate it in BIN folder. One, which reads the others based on "RT" parses & writes it to a new mongo. And another that outputs some stats on the retweets. I don't remember much more but hopefully this helps Good luck!

wei-ann-Github commented 8 years ago

mtphilli, thank you for your reply. I still have some difficulty as I would not know which user in the user_collection retweets those tweets in tweets_collection. And another peculiarity noticed in tweets_collection csv, each text has an occurrence of 2 but they all have a unique _id, do you know why is that? Why are there 2 occurrences of each text?