Open banjtheman opened 2 years ago
After doing some additional digging, I think I have a plan to setup up the pipeline for model building using the information I found for YTD August 2015 as reasonable dummy/sample data.
It provides data similar to what we expect to get from twitter scraping but at the monthly aggregate. I'm going to use it to simulate daily (or weekly since Jacob noted that is the frequency he's seeing so far in parsing the date for #1 ) data and then feed that into a model building pipeline.
The initial pipeline framework and model building will need fine tuning once we receive real data but allows to push ahead with something that is more realistic than just simulating purely random and undefined dummy data. @banjtheman - thoughts? Also, how do you want us to push code to this repo?
After doing some additional digging, I think I have a plan to setup up the pipeline for model building using the information I found for YTD August 2015 as reasonable dummy/sample data.
It provides data similar to what we expect to get from twitter scraping but at the monthly aggregate. I'm going to use it to simulate daily (or weekly since Jacob noted that is the frequency he's seeing so far in parsing the date for #1 ) data and then feed that into a model building pipeline.
The initial pipeline framework and model building will need fine tuning once we receive real data but allows to push ahead with something that is more realistic than just simulating purely random and undefined dummy data. @banjtheman - thoughts? Also, how do you want us to push code to this repo?
Yea I like the plan, gives us something to work with and we have the 2015 data to use as a baseline. You have collab access now so should be able to push
I've pushed what I have so far into 5_dcfireems branch especially since the main branch folder structure is different. I'll merge main into my local branch as some point and remap to the folder structure.
More importantly, I started doing some analysis of the the 2015 data and I realized, I'm not entirely sure what we're trying to predict? I'm not sure we have sufficient data to predict anything meaningful... The data gives us dispatched calls by type and nothing else; so I think we're going to need to curate some additional features to make this predictive model useful.
I'm going to add season feature and will see if I can get temperature data to go along with this. This would help if we wanted to predict dispatched fire calls. Do we have access to total fire/ems calls? If not, we may population to indirectly infer this since the more folks there are, the more emergency services would be needed.
Btw @jkwening, I thought at the time dcfireems only posted weekly updates, but it looks to be daily.
I didn't realize they had so many calls a day!
What is the Task
Create a model to predict the behavior of dcfireems
Why do we want to do this
We want to be able to forecast how many incidents dcfireems will respond to.
How can I get started?
We will need data from #1 but we can begin to set up the framework to create a model on dummy data. I would search on ways to do time series forecasting here's a blog to get started - https://towardsdatascience.com/time-series-modeling-using-scikit-pandas-and-numpy-682e3b8db8d1
Definition of Done
Data model created for dcfireems