OpenWaterFoundation / cdss-app-snodas-tools

Colorado's Decision Support Systems (CDSS) Snow Data Assimilation System (SNODAS) Tools
8 stars 4 forks source link

Creating TS for both 'old' and 'new' CSV files is slow #29

Open Nightsphere opened 3 years ago

Nightsphere commented 3 years ago

Reading in all 332 basins for the GCP VM SnowpackStatisticsByBasin/ CSV files, and the locally created 332 basins is very slow. I have tried a few different test set ups to debug:

From what I saw, memory never went above 2.75 GB. The command file is quite small at this point, so there's really not much going on.

smalers commented 3 years ago

The performance does not seem unreasonable. Although ideally runs should be as fast as possible, processing a lot of data can take time. 332 basins * 3 time series = 996 time series. 3 time series with 365.25 points/year and 17 years gives 18628 data points per basin and 6,184,000 data points total for time series, as 4 byte double is 24.7 MB just for time series data. There is other memory being used. The point is that memory should not be a problem.

The increase in run time is not linear but it is not crazy exponential either. Maybe it is what it is.

There is potential that that the VM or combination of Linux and Windows is slow in other ways such as I/O. There may also be some unintended inefficiencies in the command file that an be identified with review. The output may be getting buffered some weird way but usually the UI shows steady progress unless one command really is slow.

I suggest working out the details on the time series comparison using fewer stations (even 1 station) and then run the big comparison. I usually add commands to make it easy to switch between the short and long runs.

smalers commented 3 years ago

Also, the ProfileCommands command under Running and Properties menu will track command performance, but maybe I need to have a general performance check.