Open haileyplusplus opened 5 months ago
Some documentation polish is still needed.
Thank you so much for taking this on, Hailey! Excited to talk about it in more detail tomorrow -- I skimmed today but definitely think an overview would be helpful since there's so much here.
Just wanted to respond to one note in the description for context:
In addition, this demonstrates but does not fix an existing bug in the processing logic that causes some days on schedule version boundaries to be dropped from output. See the new utils/show_missing_days.py.
This was actually intended behavior because we decided we weren't sure which schedule should be used on boundary dates. We can just pick one if we want, but this was intentional.
Thank you so much for taking this on, Hailey! Excited to talk about it in more detail tomorrow -- I skimmed today but definitely think an overview would be helpful since there's so much here.
Just wanted to respond to one note in the description for context:
In addition, this demonstrates but does not fix an existing bug in the processing logic that causes some days on schedule version boundaries to be dropped from output. See the new utils/show_missing_days.py.
This was actually intended behavior because we decided we weren't sure which schedule should be used on boundary dates. We can just pick one if we want, but this was intentional.
Thanks, I hadn't realized that. Good to know.
Reviewer TODO to myself: check how the date boundary looks with the change in logic
The download cache part of this is now in PR #80
Thank you so much for taking this on, Hailey! Excited to talk about it in more detail tomorrow -- I skimmed today but definitely think an overview would be helpful since there's so much here. Just wanted to respond to one note in the description for context:
In addition, this demonstrates but does not fix an existing bug in the processing logic that causes some days on schedule version boundaries to be dropped from output. See the new utils/show_missing_days.py.
This was actually intended behavior because we decided we weren't sure which schedule should be used on boundary dates. We can just pick one if we want, but this was intentional.
Thanks, I hadn't realized that. Good to know.
I decided to keep the existing boundary behavior here. I updated the PR so that the new code that pulls scraped schedules from S3 behaves consistently.
Ready for review now! Comments welcome.
This is now basically a superset of PR #80, so if desired you could just review this.
For easier reviewing, try
git diff origin/main --color-moved-ws=allow-indentation-change --color-moved=blocks
Thank you again for tackling! A few questions.... Was mostly testing by trying to run things locally. Is there an order that I need to follow?
Specifically, when I try to run
update_data
, I get the following error:...
Cannot tell if I was supposed to run something to initialize whatever is causing the
KeyError
?
update_data without arguments is broken for me too. I will work on fixing it. In the meantime, you can test the incremental workflow by pointing to the existing frontend status file with something like this:
python3 -m update_data --update ../ghost-buses-frontend/src/Routes/schedule_vs_realtime_all_day_types_routes.json
Thank you again for tackling! A few questions.... Was mostly testing by trying to run things locally. Is there an order that I need to follow? Specifically, when I try to run
update_data
, I get the following error:...
Cannot tell if I was supposed to run something to initialize whatever is causing the
KeyError
?update_data without arguments is broken for me too. I will work on fixing it. In the meantime, you can test the incremental workflow by pointing to the existing frontend status file with something like this:
python3 -m update_data --update ../ghost-buses-frontend/src/Routes/schedule_vs_realtime_all_day_types_routes.json
It was a simple fix for the non-arguments version, which I just committed. It should work now.
Ok, I've updated the readme and addressed open comments. I think this is ready for another round of testing and viewing. Please let me know what you think about deferring the transitfeeds changes.
Description
Add the ability to incrementally update data, refactoring the processing workflow to make this feasible.
This refactors the existing logic into multiple new classes and splits realtime and schedule processing as much as possible. Included are a few other features:
Resolves # [issue]
Type of change
How has this been tested?
Manually