OpenDrift / opendrift

Open source framework for ocean trajectory modelling
https://opendrift.github.io
GNU General Public License v2.0
245 stars 120 forks source link

Large memory usage #573

Closed AndresSepulveda closed 2 years ago

AndresSepulveda commented 3 years ago

Hi,

I am running a pelagic egg case with 540,000 particles savec over 1322 steps (every 6h).

The simulation used 12 GB of RAM (out of 126G) and it seems it finished, generating a 106G file, as the last log message is

`19:48:23 INFO opendrift.models.basemodel: ========================

19:48:23 DEBUG opendrift.models.basemodel: Cleaning up

19:48:23 DEBUG opendrift.models.basemodel: Writing and closing output file: /data2/matlab/Trond/Output/Erizo_Ancud_drift_20000901_to_20010330.nc

19:48:42 INFO opendrift.export.io_netcdf: Wrote 22 steps to file /data2/matlab/Trond/Output/Erizo_Ancud_drift_20000901_to_20010330.nc

19:49:14 DEBUG opendrift.export.io_netcdf: Making netCDF file CDM compliant with fixed dimensions

20:43:20 DEBUG opendrift.export.io_netcdf: Importing from /data2/matlab/Trond/Output

/Erizo_Ancud_drift_20000901_to_20010330.nc`

However, the problem is that the process is still active an is now using 126GB of RAM + 18GB of swap (!)

What is it doing?

knutfrode commented 3 years ago

Hi,

During the simulation, only a short history (up to 100 time steps by default) is kept in memory, and is flushed to the output file during the simulation. This avoids memory problems during the simulation. After the simulation is finished, everything is read into memory again, so that it can be analysed, plotted etc. So this takes the full memory in your case. I am however a little surprised that it did not crash due to memory error - but this is probably due to the available swap. The memory should however be freed by deleting the simulation object, or simply quitting Python.

Sometimes datasets are simply to large to be kept in memory for analysis/plotting. There is some new/basic functionality to deal with such datasets, based on Xarray/Dask. This is illustrated in this example: https://opendrift.github.io/gallery/example_huge_output.html

AndresSepulveda commented 3 years ago

Well. Seems adding

del o

after the o.run instruction does not releases the memory.

I used the following lines from the example_huge_output example

from datetime import datetime, timedelta import opendrift o =opendrift.open_xarray('loco_ancud_20000101_to_20010330_sml.nc',analysis_file='simulation_density.nc') o.animation(density=True,density_pixelsize_m=500,fast=False,show_elements=False,vmin=0,vmax=200)

but I get

22:04:50 INFO opendrift.models.basemodel: Calculating density array, this may take some time... 22:04:50 DEBUG opendrift.models.basemodel: Finding min and max of lon and lat... Traceback (most recent call last): File "", line 1, in File "/home/matlab/opendrift/opendrift/models/basemodel.py", line 2904, in animation H, lon_array, lat_array = self.get_density_xarray(pixelsize_m=density_pixelsize_m, File "/home/matlab/opendrift/opendrift/models/basemodel.py", line 3710, in get_density_xarray h = histogram(self.ds.lon, self.ds.lat, bins=[lonbin, latbin], File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/xarray.py", line 140, in histogram h_data = _histogram( File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py", line 266, in histogram bin_counts = _histogram_2d_vectorized( File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py", line 131, in _histogram_2d_vectorized bin_indices = ravel_multi_index(each_bin_indices, hist_shapes) File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/duck_array_ops.py", line 24, in f return getattr(module, name)(*args, *kwargs) File "<__array_function__ internals>", line 5, in ravel_multi_index File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/dask/array/core.py", line 1497, in __array_function__ return da_func(args, **kwargs) File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/dask/array/routines.py", line 1432, in ravel_multi_index return multi_index.map_blocks( AttributeError: 'list' object has no attribute 'map_blocks'

gauteh commented 3 years ago

The latter error is a problem with dask and histogram versions, check the latest environment.yml. You might see better results with memory if you manually collect memory after deleting (all references to the memory): import gc; gc.collect()

fre. 9. apr. 2021 kl. 03:07 skrev Andres Sepulveda @.***

:

Well. Seems adding

del o

after the o.run instruction does not releases the memory.

I used the following lines from the example_huge_output example

from datetime import datetime, timedelta import opendrift o =opendrift.open_xarray('loco_ancud_20000101_to_20010330_sml.nc ',analysis_file='simulation_density.nc')

o.animation(density=True,density_pixelsize_m=500,fast=False,show_elements=False,vmin=0,vmax=200)

but I get

22:04:50 INFO opendrift.models.basemodel: Calculating density array, this may take some time... 22:04:50 DEBUG opendrift.models.basemodel: Finding min and max of lon and lat... Traceback (most recent call last): File "", line 1, in File "/home/matlab/opendrift/opendrift/models/basemodel.py", line 2904, in animation H, lon_array, lat_array = self.get_density_xarray(pixelsize_m=density_pixelsize_m, File "/home/matlab/opendrift/opendrift/models/basemodel.py", line 3710, in get_density_xarray h = histogram(self.ds.lon, self.ds.lat, bins=[lonbin, latbin], File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/xarray.py", line 140, in histogram h_data = _histogram( File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py", line 266, in histogram bin_counts = _histogram_2d_vectorized( File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/core.py", line 131, in _histogram_2d_vectorized bin_indices = ravel_multi_index(each_bin_indices, hist_shapes) File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/xhistogram/duck_array_ops.py", line 24, in f return getattr(module, name)(*args, *kwargs) File "<array_function internals>", line 5, in ravel_multi_index File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/dask/array/core.py", line 1497, in array_function return da_func(args, **kwargs) File "/home/matlab/miniconda3/envs/opendrift/lib/python3.9/site-packages/dask/array/routines.py", line 1432, in ravel_multi_index return multi_index.map_blocks( AttributeError: 'list' object has no attribute 'map_blocks'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OpenDrift/opendrift/issues/573#issuecomment-816331723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAN36252ZYG6JRDXFPXOK3THZHLZANCNFSM42R3CL6Q .