Open JustinSGray opened 3 years ago
the memory usage might be related to total coloring. That many constraints might be using a lot of memory. Total coloring is still coded as a dense operation I think. Its possible this is an OM memory usage issue...
More information. User was able to switch to a larger memory machine and gave this data:
Ok, ran that same coloring case on my home computer which has 48Gb mem.
Full total jacobian was computed 3 times, taking 479.575982 seconds.
Total jacobian shape: (2267, 2117)
Jacobian shape: (2267, 2117) (29.37% nonzero)
FWD solves: 1209 REV solves: 0
Total colors vs. total size: 1209 vs 2117 (42.9% improvement)
Sparsity computed using tolerance: 1e-25
Time to compute sparsity: 479.575982 sec.
Time to compute coloring: 559.167794 sec.
Memory peaked at 58.5% on top. Interestingly, it was running at 1200% processor and about 4% memory,
(parallel on 12 cores I assume) then it dropped to 100% (single core) and the memory started gradually
increasing. It cycled between 24 and 58 % 3 times (if I counted correctly) before settling in at 1200%cpu
and 23% mem.
Note: default num_full_jacs
is 3. So this data indicates that the it might be the computing of the pseudo-inverse thats taking up a lot of memory. We might need to investigate moving to a column based sparsity approach similar to the partials to lower the memory cost?
Curious if the user was using a version after OpenMDAO/OpenMDAO#2116. That commit in general significantly reduced the memory footprint during total coloring and improved performance.
Summary of Issue
User reported process termination due to high memory usage from a large, but not massive model. running on a machine with 8 gigs of memory. One other use also had issues running on 8 gigs of memory..
It may not be fixable, but worth looking into
Issue Type
Description
User gave the following info (could not share full model):
Linux VM with 8 gigs of ram
Thats a fairly large opt problem, but not absurdly so. It may be unique because it was combined with a pretty large actual models (note size of input/output vectors).
could probably simulate this by just duplicating some output state into an additional output array a bunch of times, then adding that to time series.
Its worth doing some memory profiling on a simulated use case like this, with SLSQP, SNOPT, and IPOPT drivers. Maybe some obvious low hanging fruit will show up. Maybe not... worth checking though.