Closed tiffanychu90 closed 2 years ago
@hunterowens: CLOSING THIS!
@ian-r-rose: Only now as I'm going through to clean up my issues am I noticing you've been keeping up with this! Getting my feet wet with dask
, and haven't yet fully explored...!
@ian-r-rose: Only now as I'm going through to clean up my issues am I noticing you've been keeping up with this! Getting my feet wet with
dask
, and haven't yet fully explored...!
It's been fun to watch, nice work!
After receiving a research request, use this template to plan and track your work. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
Epic Information - Daskify HQTA
Summary
dask
anddask_geopandas
to shorten the run time from several hours to...something way less.rt_delay
work, so move over the utilities functions related there intoshared_utils
to bert_utils
, etc.dask
computations to the cloud, once it runs locally successfully, to make full use of the partitioning.dask
. Some RT stuff is done once for all (cutting road network into 1 km segments), but other stuff is done daily. How much of the daily stuff would be actually be the spatial processing, and how much should remain in SQL? Revisit this discussion when we've rewritten some of the spatial processing of vehicle positions usingdask
and have a better understanding of RT table schemas.Research required:
dask
,dask_geopandas
dask_geopandas
, and usegeopandas
if we really can'tNotes, misc:
Reviewers [Stakeholders]
Issues
dask_geopandas.sjoin
first to narrow down rows to dodask_geopandas.clip
. new step to cut down computing time.hqta_points
,hqta_areas
geoparquetsgtfs_utils
. accommodate the fact that different operators might be "valid" each month, and enforce the conditions in which all the merges should be successful (so later steps don't break down)shape_id
by route_length, figure out the symmetric_difference and pick longestshape_id
in each direction, combine into 1 row, and then cut that one line into segments.hqta_details
is missing forhqta_type==hq_corridor_bus
and rerunD2
to createhqta_areas
for July and Aug. July file gdb sent for open data portal is first time stepping through entire workflow post-dask-rewrite.route_id
to bothhqta_points
andhqta_areas
README
to step through workflowDeliverables