banjtheman / dc_ndoch_2021

Code for DC 2021 National Day of Civic Hacking
1 stars 2 forks source link

DC Fire EMS Twitter Stat Scrapper #1

Open banjtheman opened 2 years ago

banjtheman commented 2 years ago

What is the Task

Create a scraper to collect data from DC Fire and EMS on the number of calls responded to each day.

Why do we want to do this

We want to create a dataset to highlight how active DC Fire and EMS, so we can use it for analysis

How can I get started?

There are many tools to choose from, pick what works for you - https://developer.twitter.com/en/docs/twitter-api/tools-and-libraries

The premise will be to find all tweets from dcfireems that are similar to this https://twitter.com/dcfireems/status/1435263503011663879

We will want to extract the following from each tweet

  1. Date
  2. Total number of calls
  3. critical calls
  4. non-critical calls
  5. fire calls
  6. source

And to create a csv with the data

date,total calls,critical,non-critical,fire,source
9/6/2021,597,184,296,117,https://twitter.com/dcfireems/status/1435263503011663879

Definition of Done

A csv is created from the dcfireems tweets

leplerjacob commented 2 years ago

Okay....After some troubleshooting my environment, I managed to get data from dcfireems twitter channel on jupyter notebook image

banjtheman commented 2 years ago

Nice go ahead and push what ya got to a branch when ready

leplerjacob commented 2 years ago

Will do!

leplerjacob commented 2 years ago

Should I push just the ipynb file? I am importing an open-source tool called Twint. Plus i got the csv file it pushed too, but I'll push that later since I still need to populate it only with the data we need.

leplerjacob commented 2 years ago

May need help continuing this one. If anyone has time this week so I can go over the issues I'm having. Maybe we can resolve them together. Also, I am not too familiar with Pandas - which I think would be incredibly useful in this case.

banjtheman commented 2 years ago

May need help continuing this one. If anyone has time this week so I can go over the issues I'm having. Maybe we can resolve them together. Also, I am not too familiar with Pandas - which I think would be incredibly useful in this case.

Sure you can post issues in the comments, and I can take a look

leplerjacob commented 2 years ago

I got the data from the tweet. Only thing now is I need to get pandas to insert a new row for the columns. As of right now it only inserts the last dataset. Basically it overwrites previous rows

banjtheman commented 2 years ago

I got the data from the tweet. Only thing now is I need to get pandas to insert a new row for the columns. As of right now it only inserts the last dataset. Basically it overwrites previous rows

Will want to use concat method Example...

df1 = pd.read_csv("old_data.csv")
df2 = twitter_just_scrapped()

all_dfs = [df1, df2]
new_df =  pd.concat(all_dfs)

More reading: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Also make sure the saved CSV has the header columns

leplerjacob commented 2 years ago

I am going to try and run this on my end with your recent PR @jkwening . I may have some questions if I get stuck.