dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Modify logic when reading point observations from NetCDF files to improve runtime performance on large blocksize systems (like glade) #181

Open dwfncar opened 12 years ago

dwfncar commented 12 years ago

John Henderson, a met_user, raised a performance issue when running MET on systems that use a large blocksize. The details can be found here:
   https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=57709

Basically, the issue is that much longer execution times for the exact same run are caused by the use of large blocksizes on the glade system (and NASA pliedas, that John's using). In John's example, what tooks 24 seconds on rambler, took 14 minutes on glade.

This task is to modify the logic for point_stat (and ensemble_stat) in the process_obs_file() function to do a much smaller number of much larger reads from the NetCDF file. Currently, point_stat is doing 1 or more "gets" for each observation value it is processing. Instead, try doing 1 "get" for each variable, store the data in memory, and then do the processing on the copy in memory.

Then run point_stat test cases locally and on glade and make sure the discrepancy in runtime performance improves.

Additional tasks may be derived from this one if other areas of MET that have this same issue are identified. For example, how are we reading gridded data from NetCDF files? [MET-181] created by johnhg

dwfncar commented 12 years ago

Ever since we distributed the Ensemble-Stat tool, the interface to the point observations in MET has been in need of attention. Currently, there's a lot of code that's duplicated in Point-Stat and Ensemble-Stat to parse those point observation files. I'd like to replace that with library functionality to handle the I/O for NetCDF point observation files.

In the process of doing that, I'm thinking we could provide a configurable option for the user to specify the number of observations to be read. Setting the default value to 1 should (hopefully) maintain the speed of the current METv4.0 release. But having the option to increase that value on bluefire (and other large blocksize systems) would give us a knob to turn to improve runtime performance up there.

This issue probably needs more research to get a better understanding of what else we could to do get Point-Stat to run faster up there. But the code is in need of a revision anyway - so we might as well take runtime on bluefire into account.

by johnhg