cal-adapt / climakitae

A Python toolkit for retrieving, visualizing, and performing scientific analyses with data from the Cal-Adapt Analytics Engine.
https://climakitae.readthedocs.io
BSD 3-Clause "New" or "Revised" License
21 stars 2 forks source link

Data larger than 12GB can not be loaded using ck.load into memory on HUB. #345

Open elehmer opened 6 months ago

elehmer commented 6 months ago

Need to investigate why there is a short term memory spike when using ck.load. Practical limit for using ck.load is 12GB where it is seen to go to around 26GB of RAM briefly while creating memory loaded object. Seems like we are creating a copy of the data somewhere and need to see if we can streamline the memory usage.

FYI changing from xr.compute to xr.load in function has no effect.

elehmer commented 6 months ago

Investigate ck.load limitations

elehmer commented 6 months ago

Ahh, I didn't see the change to xr.load in the merge #337 . This now doesn't shrink the memory back after loading so this is definitely not the way to go.

elehmer commented 6 months ago

ProgressBar is causing memory spike to remain after loading. Removed in #346

elehmer commented 6 months ago

The difference between xr.load and xr.compute is that load overwrites the object so you can write:

data_to_use.load()

Since we are returning from a function compute makes more sense. Also there seems to be more overhead using xr.load with a 11.95GB retrieval using 26GB of RAM instead of 20.5GB with compute (short term memory spike) with ~14GB being the end RAM size.

elehmer commented 6 months ago

Changing back to compute in #348