Creating TS for both 'old' and 'new' CSV files is slow

Nightsphere commented 3 years ago

Reading in all 332 basins for the GCP VM SnowpackStatisticsByBasin/ CSV files, and the locally created 332 basins is very slow. I have tried a few different test set ups to debug:

Since the GCP files contain about 17 years worth of daily data, I took out some basins and slowly put them back into the SnowpackStatisticsByBasin/ folder. The results:
- 100 basins: 57 seconds
- 150 basins: 1 minute 37 seconds
- 240 basins: 3 minutes 31 seconds
- 294 basins: 5 minutes 59 seconds

From what I saw, memory never went above 2.75 GB. The command file is quite small at this point, so there's really not much going on.

smalers commented 3 years ago

The performance does not seem unreasonable. Although ideally runs should be as fast as possible, processing a lot of data can take time. 332 basins * 3 time series = 996 time series. 3 time series with 365.25 points/year and 17 years gives 18628 data points per basin and 6,184,000 data points total for time series, as 4 byte double is 24.7 MB just for time series data. There is other memory being used. The point is that memory should not be a problem.

The increase in run time is not linear but it is not crazy exponential either. Maybe it is what it is.

There is potential that that the VM or combination of Linux and Windows is slow in other ways such as I/O. There may also be some unintended inefficiencies in the command file that an be identified with review. The output may be getting buffered some weird way but usually the UI shows steady progress unless one command really is slow.

I suggest working out the details on the time series comparison using fewer stations (even 1 station) and then run the big comparison. I usually add commands to make it easy to switch between the short and long runs.

smalers commented 3 years ago

Also, the ProfileCommands command under Running and Properties menu will track command performance, but maybe I need to have a general performance check.

OpenWaterFoundation / cdss-app-snodas-tools

Creating TS for both 'old' and 'new' CSV files is slow #29