chihacknight / chn-ghost-buses

"Ghost buses" analysis project through Chi Hack Night

https://github.com/chihacknight/breakout-groups/issues/217

MIT License

19 stars 14 forks source link

Prepare for automated JSON updates #76

Closed haileyplusplus closed 5 months ago

haileyplusplus commented 5 months ago

Description

This refactors some functions into classes and replaces a dict with instances of a new ScheduleFeedInfo class.

Also updates some requirements files to get things working on Apple silicon and update a vulnerable version of requests.

Type of change

[ ] Bug fix
[x] New functionality
[ ] Documentation

How has this been tested?

Manually

haileyplusplus commented 5 months ago

Thanks a lot for the PR! The memoization will save a lot of work, and the refactors are nice.

If compare_schedule_and_rt.py will eventually be run using GitHub Actions, would the pickle files have to be saved to S3 or some other place, and the script would look for them there?

For the optimization to work, yes, the pickle files will need to be saved in an accessible location. If the script always runs in the same virtual container with semi-persistent storage that would be good enough, but I'm not super familiar with GitHub Actions.

They also don't have to be pickle files--I think pretty much everything is a dataframe which looks like it has its own json serialization/deserialization, so I can update to just do that.

haileyplusplus commented 5 months ago

Thanks for this context on the intended workflows. If we're re-running the history in workflow 2 for some reason then yeah, we probably want to redo all of the computations so the memoization here is not going to help for that.

For purposes of this PR I can split out the memoization and keep the refactoring. One of my main motivations for the memoization in the first place was to make local development easier since with the current state of update_data.py I believe you have to recalculate everything from all time on every run. It does look like progress is being saved in data_output/scratch, just not loaded at present. So I could also look at using that if present as an intermediate step.

I should be able to make it for at least some of tonight so we can discuss more on Zoom and/or Slack.

dcjohnson24 commented 5 months ago

Thanks for this context on the intended workflows. If we're re-running the history in workflow 2 for some reason then yeah, we probably want to redo all of the computations so the memoization here is not going to help for that.

For purposes of this PR I can split out the memoization and keep the refactoring. One of my main motivations for the memoization in the first place was to make local development easier since with the current state of update_data.py I believe you have to recalculate everything from all time on every run. It does look like progress is being saved in data_output/scratch, just not loaded at present. So I could also look at using that if present as an intermediate step.

I should be able to make it for at least some of tonight so we can discuss more on Zoom and/or Slack.

I would agree with keeping the memoization for local development because memory has been a problem at times when manually updating the json files for the frontend.

haileyplusplus commented 5 months ago

Updated PR based on discussion at tonight's meeting.

haileyplusplus commented 5 months ago

The requirements changes are in PR #77. More complete refactoring is in https://github.com/haileyplusplus/chn-ghost-buses/tree/refactor2, so the rest of the changes here are no longer needed.