banjtheman / dc_ndoch_2021

Code for DC 2021 National Day of Civic Hacking
1 stars 2 forks source link

DC Fire EMS Twitter Stat Scrapper #1

Open banjtheman opened 2 years ago

banjtheman commented 2 years ago

What is the Task

Create a scraper to collect data from DC Fire and EMS on the number of calls responded to each day.

Why do we want to do this

We want to create a dataset to highlight how active DC Fire and EMS, so we can use it for analysis

How can I get started?

There are many tools to choose from, pick what works for you -

The premise will be to find all tweets from dcfireems that are similar to this

We will want to extract the following from each tweet

  1. Date
  2. Total number of calls
  3. critical calls
  4. non-critical calls
  5. fire calls
  6. source

And to create a csv with the data

date,total calls,critical,non-critical,fire,source

Definition of Done

A csv is created from the dcfireems tweets

leplerjacob commented 2 years ago

Okay....After some troubleshooting my environment, I managed to get data from dcfireems twitter channel on jupyter notebook image

banjtheman commented 2 years ago

Nice go ahead and push what ya got to a branch when ready

leplerjacob commented 2 years ago

Will do!

leplerjacob commented 2 years ago

Should I push just the ipynb file? I am importing an open-source tool called Twint. Plus i got the csv file it pushed too, but I'll push that later since I still need to populate it only with the data we need.

leplerjacob commented 2 years ago

May need help continuing this one. If anyone has time this week so I can go over the issues I'm having. Maybe we can resolve them together. Also, I am not too familiar with Pandas - which I think would be incredibly useful in this case.

banjtheman commented 2 years ago

May need help continuing this one. If anyone has time this week so I can go over the issues I'm having. Maybe we can resolve them together. Also, I am not too familiar with Pandas - which I think would be incredibly useful in this case.

Sure you can post issues in the comments, and I can take a look

leplerjacob commented 2 years ago

I got the data from the tweet. Only thing now is I need to get pandas to insert a new row for the columns. As of right now it only inserts the last dataset. Basically it overwrites previous rows

banjtheman commented 2 years ago

I got the data from the tweet. Only thing now is I need to get pandas to insert a new row for the columns. As of right now it only inserts the last dataset. Basically it overwrites previous rows

Will want to use concat method Example...

df1 = pd.read_csv("old_data.csv")
df2 = twitter_just_scrapped()

all_dfs = [df1, df2]
new_df =  pd.concat(all_dfs)

More reading:

Also make sure the saved CSV has the header columns

leplerjacob commented 2 years ago

I am going to try and run this on my end with your recent PR @jkwening . I may have some questions if I get stuck.