OpenMDAO / dymos

Open Source Optimization of Dynamic Multidisciplinary Systems
Apache License 2.0
202 stars 65 forks source link

Reduce memory usage for timeseries jac computation #1001

Closed johnjasa closed 10 months ago

johnjasa commented 10 months ago

Summary

After a good amount of digging into cases that scaled poorly as number of procs increased, I found a toarray() call that was turning a sparse array into a dense before going back to sparse. In the case of @kanekosh's run script with relatively high num_segments and a memory-expensive parallel ODE, this caused a large increase in memory usage during setup().

This new implementation keeps the jac in sparse format throughout. I've changed it in the two places where it happened -- and maybe those files could be combined into one? Or @robfalck were they two separate files on purpose?

Prior setup() mem usage: without_fix

Mem usage for setup() with the fix: with_fix

This PR does not address potentially large memory usage in final_setup() as that is a separate but related issue regarding how PETScVectors are created and used. We should further discuss if we have action items there.

Related Issues

Backwards incompatibilities

None

New Dependencies

None

kanekosh commented 10 months ago

Thank you @johnjasa ! This fixes the issue I was facing.

Here is a summary of the memory usage for my Dymos+OAS case. The memory usage is shown in % of the total memory I have on my machine (64GB).

Before this fix (Dymos 1.9.0)

n_procs = 1
setup:          12.1%
run_model:       5.5%
compute_totals: 10.6%

n_procs = 2
setup:          20.6%
run_model:       5.2%
compute_totals: 11.0%

n_procs = 4
setup:          37.6%
run_model:       5.6%
compute_totals: 12.0%

n_procs = 8
setup:          72.0%
run_model:       7.2%
compute_totals: 13.6%

n_procs = 16
runs out of memory during setup

And with this fix:

n_procs = 1
setup:           3.8% 
run_model:       4.4%
compute_totals: 10.6% 

n_procs = 2
setup:           4.2%
run_model:       5.2%
compute_totals: 11.0%

n_procs = 4
setup:           4.8%
run_model:       5.6%
compute_totals: 12.0%

n_procs = 8   
setup:           6.3%
run_model:       7.2%
compute_totals: 13.6%

n_procs = 16
setup:           9.6%
run_model:      11.2%
compute_totals: 16.0%

As far as I observe,final_setup() uses approximately the same amount of memory as run_model and is not a bottleneck. But I've only monitored memory from top with 0.1sec frequency, so I might have overlooked if there was any "spike" during final_setup.

coveralls commented 10 months ago

Coverage Status

coverage: 92.55% (+0.02%) from 92.53% when pulling fba061dc70e6b9b6c00ec5fba5a143251cee2133 on johnjasa:jac_sparse_fix into 9e02030dc76183c97cfbf2bcc671e109f4c3d139 on OpenMDAO:master.