banjtheman / dc_ndoch_2021

Code for DC 2021 National Day of Civic Hacking
1 stars 2 forks source link

dcfireems data modeling #5

Open banjtheman opened 2 years ago

banjtheman commented 2 years ago

What is the Task

Create a model to predict the behavior of dcfireems

Why do we want to do this

We want to be able to forecast how many incidents dcfireems will respond to.

How can I get started?

We will need data from #1 but we can begin to set up the framework to create a model on dummy data. I would search on ways to do time series forecasting here's a blog to get started - https://towardsdatascience.com/time-series-modeling-using-scikit-pandas-and-numpy-682e3b8db8d1

Definition of Done

Data model created for dcfireems

jkwening commented 2 years ago

After doing some additional digging, I think I have a plan to setup up the pipeline for model building using the information I found for YTD August 2015 as reasonable dummy/sample data.

It provides data similar to what we expect to get from twitter scraping but at the monthly aggregate. I'm going to use it to simulate daily (or weekly since Jacob noted that is the frequency he's seeing so far in parsing the date for #1 ) data and then feed that into a model building pipeline.

The initial pipeline framework and model building will need fine tuning once we receive real data but allows to push ahead with something that is more realistic than just simulating purely random and undefined dummy data. @banjtheman - thoughts? Also, how do you want us to push code to this repo?

banjtheman commented 2 years ago

After doing some additional digging, I think I have a plan to setup up the pipeline for model building using the information I found for YTD August 2015 as reasonable dummy/sample data.

It provides data similar to what we expect to get from twitter scraping but at the monthly aggregate. I'm going to use it to simulate daily (or weekly since Jacob noted that is the frequency he's seeing so far in parsing the date for #1 ) data and then feed that into a model building pipeline.

The initial pipeline framework and model building will need fine tuning once we receive real data but allows to push ahead with something that is more realistic than just simulating purely random and undefined dummy data. @banjtheman - thoughts? Also, how do you want us to push code to this repo?

Yea I like the plan, gives us something to work with and we have the 2015 data to use as a baseline. You have collab access now so should be able to push

jkwening commented 2 years ago

I've pushed what I have so far into 5_dcfireems branch especially since the main branch folder structure is different. I'll merge main into my local branch as some point and remap to the folder structure.

More importantly, I started doing some analysis of the the 2015 data and I realized, I'm not entirely sure what we're trying to predict? I'm not sure we have sufficient data to predict anything meaningful... The data gives us dispatched calls by type and nothing else; so I think we're going to need to curate some additional features to make this predictive model useful.

I'm going to add season feature and will see if I can get temperature data to go along with this. This would help if we wanted to predict dispatched fire calls. Do we have access to total fire/ems calls? If not, we may population to indirectly infer this since the more folks there are, the more emergency services would be needed.

leplerjacob commented 2 years ago

Btw @jkwening, I thought at the time dcfireems only posted weekly updates, but it looks to be daily. image

leplerjacob commented 2 years ago

I didn't realize they had so many calls a day!