CUSP2017 / citibike-publicspace

Data analysis quantifying the value of the built environment to Citibike bike stop usage.
MIT License
1 stars 1 forks source link

Ridership Counts #19

Closed kristikorsberg closed 7 years ago

kristikorsberg commented 7 years ago

Hi - I started the process of quantifying ridership today.

Process: Grouped each month's data by 'start station id' Counted 'trip duration' as the metric for # of rides at each station Merged 12 datasets Averaged the 'trip duration' count

This method leaves a lot of room for improvement, but I thought it was a good place to start. Things to discuss at our next meeting: 1) This method assumes that all stations available each month were used as a 'start station', which may not be the case. Has anyone found a historical record of stations available over time? We could check the 'monthly operating reports': https://www.citibikenyc.com/system-data/operating-reports, where there are station counts for each month. 2) I used a for loop (for shame) to iterate through downloading the different datasets. It's a little slow, but because I reduce the dataframes inside the loop, it's not bad. 3) I know we've gone over this before, but how should I be saving the data? Is it going into 'src' folder? In my code, I just created a new folder (outside of our project so as to not bog it down with many csvs) to store them all, but that can/should be changed.

@pichot @kayzhou22 @dfay88 @aaron-15

pichot commented 7 years ago

Re Question 3: I stored any data I needed for my processing (like a converted shapefile) in data/interim. At the end of my notebook I saved the final csv to data/processed.

Do you have code you could push up to a branch so I can see the loop?

Also, you're just using the trip duration column to make a count, right? You're not actually using the duration of the trip?

kristikorsberg commented 7 years ago

@pichot Thanks for the response on 3. I pushed my code to github this afternoon, but I think I did so to the master. Sorry about that. Will use my branch moving forward. And yes, it uses trip duration as a row count. It's not a sum of the trip durations.

pichot commented 7 years ago

Cool. 👍