Potential memory use issue

JustinSGray commented 3 years ago

Summary of Issue

User reported process termination due to high memory usage from a large, but not massive model. running on a machine with 8 gigs of memory. One other use also had issues running on 8 gigs of memory..

It may not be fixable, but worth looking into

Issue Type

[ ] Bug
[ ] Enhancement
[ ] Docs
[x] Miscellaneous

Description

User gave the following info (could not share full model):

Linux VM with 8 gigs of ram

============== Problem Summary ============
Groups:              34
Components:         194
Max tree depth:       6
Design variables:           57   Total size:     2123
Nonlinear Constraints:      81   Total size:     2266
    equality:               76                   1864
    inequality:              5                    402
Linear Constraints:          2   Total size:        2
    equality:                0                      0
    inequality:              2                      2
Objectives:                  1   Total size:        1
Input variables:          1380   Total size:    79939
Output variables:         1019   Total size:    64222
Total connections: 1380   Total transfer data size: 79939
Driver type: pyOptSparseDriver
Linear Solvers: [LinearRunOnce x 18, DirectSolver x 16]
Nonlinear Solvers: [NonlinearRunOnce x 24, NewtonSolver x 10]

Thats a fairly large opt problem, but not absurdly so. It may be unique because it was combined with a pretty large actual models (note size of input/output vectors).

could probably simulate this by just duplicating some output state into an additional output array a bunch of times, then adding that to time series.

Its worth doing some memory profiling on a simulated use case like this, with SLSQP, SNOPT, and IPOPT drivers. Maybe some obvious low hanging fruit will show up. Maybe not... worth checking though.

JustinSGray commented 3 years ago

the memory usage might be related to total coloring. That many constraints might be using a lot of memory. Total coloring is still coded as a dense operation I think. Its possible this is an OM memory usage issue...

JustinSGray commented 3 years ago

More information. User was able to switch to a larger memory machine and gave this data:

Ok, ran that same coloring case on my home computer which has 48Gb mem.
Full total jacobian was computed 3 times, taking 479.575982 seconds.
Total jacobian shape: (2267, 2117)
Jacobian shape: (2267, 2117)  (29.37% nonzero)
FWD solves: 1209   REV solves: 0
Total colors vs. total size: 1209 vs 2117  (42.9% improvement)
Sparsity computed using tolerance: 1e-25
Time to compute sparsity: 479.575982 sec.
Time to compute coloring: 559.167794 sec.

Memory peaked at 58.5% on top. Interestingly, it was running at 1200% processor and about 4% memory, 
(parallel on 12 cores I assume) then it dropped to 100% (single core) and the memory started gradually 
increasing. It cycled between 24 and 58 % 3 times (if I counted correctly) before settling in at 1200%cpu 
and 23% mem.

Note: default num_full_jacs is 3. So this data indicates that the it might be the computing of the pseudo-inverse thats taking up a lot of memory. We might need to investigate moving to a column based sparsity approach similar to the partials to lower the memory cost?

robfalck commented 2 years ago

Curious if the user was using a version after OpenMDAO/OpenMDAO#2116. That commit in general significantly reduced the memory footprint during total coloring and improved performance.

OpenMDAO / dymos