NOAA-OWP / t-route

Tree based hydrologic and hydraulic routing
Other
44 stars 50 forks source link

More time-efficient order in timeslice file creation #823

Closed JurgenZach-NOAA closed 1 month ago

JurgenZach-NOAA commented 3 months ago

Changed the order of timeslice file import and creation of dataframes. Previously, a pandas dataframe was created for each timeslice, and then the dataframes were concatenated into a combined dataframe for 15-minute resampling. Now, only numpy arrays are imported, which are then combined into one large dataframe.

Additions

-

Removals

-

Changes

Testing

  1. Running any example that reads usgs timeslice files
  2. If you want to benchmark the time, the pyinstruments profiler is suggested, which will create a report with 1 millisecond time resolution:

from pyinstrument import Profiler profiler = Profiler() profiler.start()

 [CODE YOU WANT TO BENCHMARK]

profiler.stop() profiler.print()

Screenshots

For Lower Colorado example [test_AnA.yaml]: benchmark for get_obs_from_timeslices shows 2-fold speedup:

BEFORE:

image

AFTER:

image

Notes

-

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other

AminTorabi-NOAA commented 3 months ago

I test it on an example I had on vpu-17 and on this line timeslice_obs_df = pd.concat(dfList, axis = 1) It gives error that *** pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects Probably because you try to concatenate dataframes that have non-unique index values. Also when I checked there was a difference in length between dfList[0] and dfList[1] and others.

JurgenZach-NOAA commented 1 month ago

Archived. End of Project.