USEPA / StreamCat

Landscape summaries of natural and anthropogenic landscape features for ~2.65 million streams, and their associated catchments, within the conterminous USA
https://www.epa.gov/national-aquatic-resource-surveys/streamcat-dataset
52 stars 11 forks source link

Speedup Accumulation process #137

Open TravisH18 opened 1 week ago

TravisH18 commented 1 week ago

A few parts of the Accumulation loop are bottlenecks at the moment.

  1. make_all_cat_comids takes ~15 minutes.
  2. Zone processing for loop in MakeVectors function takes ~80 minutes.
  3. Bastards function itself takes ~2-3 minutes.

Generic parallelism and changes to libraries like pyogrio or doing numpy vectorization will give us massive speedups without having to alter code too much.

TravisH18 commented 1 day ago

Created Speedup branch and pushed changes to makeVector process. This was the main slowdown in the accumulation process. Could continue working on children / bastard functions as well as adding numba to swapper function since it uses pure numpy processes.