ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
384 stars 73 forks source link

Memory accumulation when looping through multiple GRIB files #283

Open ecvdk2 opened 2 years ago

ecvdk2 commented 2 years ago

When I loop through multiple GRIB files in one python script, memory keeps accumulating until my system crashes. I close all files I open after I extract whatever I need with "datasetName.close()". There are no leaks in my script. I noticed that this accumulation only happens whenever a GRIB file is opened for the first time and therefore the .idx file is created. Is it possible that these index files stay open even after the dataset itself is closed? If so, is there a way to close them?

h-sharif commented 2 years ago

Hi, I experienced the exact same problem when I tried to loop over GRIB files and integrate over a boundary. I have also used del and garbage collector commands, but this doesn't release the memory.

ecvdk2 commented 2 years ago

Hi, right now I managed to circumvent the problem by writing my loop in bash and calling my python script for 1 GRIB file in the bash loop. Not ideal, but at least I can now let it run without having to restart it manually all the time. To give some context: It is a script I use to convert my GRIB files to netcdfs with some modification of the data (some integration, calculation of some statistics, etc.)

emfdavid commented 1 year ago

The patches I referenced above seem to resolve the issue with tmp files I am seeing... but I still have a memory leak. (I will fix the broken tests asap)

I have confirmed that cfgrib.messages.Message.del is being called as my message objects go out of scope. I am unsure where to look next. I noticed there are other release methods, maybe some of them are required? https://github.com/ecmwf/eccodes-python/blob/1497a50b3d51705a1dddb538b747e3e3ab4607c3/eccodes/eccodes.py#L85 https://github.com/ecmwf/eccodes-python/blob/1497a50b3d51705a1dddb538b747e3e3ab4607c3/eccodes/eccodes.py#L103

ecvdk2 commented 1 year ago

I also haven't found a workaround within python yet. The method using bash to keep relaunching a python script does work very well so I have just been doing that.

I did run into additional problems with these .idx files however. When there is an "old" .idx file, cfgrib will say something like "can't use this .idx file" and it will take ridiculously long to open the file, so I've been having to delete them if I wanted to reprocess a file. But don't think this has anything to do with the temp files.

emfdavid commented 1 year ago

I did run into additional problems with these .idx files however. When there is an "old" .idx file, cfgrib will say something like "can't use this .idx file" and it will take ridiculously long to open the file, so I've been having to delete them if I wanted to reprocess a file. But don't think this has anything to do with the temp files.

Yes - I believe those are supposed to be temp files - but they are not properly cleaned up. You can't open them again as you have noticed. When the context manager closes they are supposed to unlink, but they don't I hope to have the PR's merged... but until then you can just patch the CF grib code as follows:

import contextlib
import os
import logging
import cfgrib.messages

logger = logging.getLogger(__file__)

@contextlib.contextmanager
def compat_create_exclusive(path):
    # type: (str) -> T.Generator[T.IO[bytes], None, None]
    fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
    with open(fd, mode="wb") as file:
        try:
            yield file
        finally:
            os.unlink(path)

def patch_cfgrib():
    logger.warning(
        "Patching: %s method %s",
        cfgrib.messages,
        cfgrib.messages.compat_create_exclusive,
    )
    cfgrib.messages.compat_create_exclusive = compat_create_exclusive

Then import that file and and call the patch method somewhere in your script before you start opening grib files.

iainrussell commented 1 year ago

Just to try to clarify, these are not intended to be temporary files, they are index files that prevent re-scanning of the GRIB meta-data. However, their format can change over time (rarely, but it happens). In this case, an old, incompatible index file will no longer be used, and a new one created. The old one would need to be deleted manually (in theory it could still be useful if you opened the GRIB file with an older version of cfgrib).

emfdavid commented 1 year ago

@iainrussell So this is a feature not a bug? Only unlink (remove) on error?

emfdavid commented 1 year ago

Okay @ecvdk2 @iainrussell has a better solution for the tmp/index files https://github.com/ecmwf/cfgrib/pull/306#issuecomment-1191711945 I don't recommend the patching technique unless absolutely necessary. Thanks

emfdavid commented 1 year ago

I have confirmed that cfgrib.messages.Message.del is being called as my message objects go out of scope. I am unsure where to look next. I noticed there are other release methods, maybe some of them are required? https://github.com/ecmwf/eccodes-python/blob/1497a50b3d51705a1dddb538b747e3e3ab4607c3/eccodes/eccodes.py#L85 https://github.com/ecmwf/eccodes-python/blob/1497a50b3d51705a1dddb538b747e3e3ab4607c3/eccodes/eccodes.py#L103

Still an open issue on the memory leak. Are there other release methods from eccodes that should be called @iainrussell ?

iainrussell commented 1 year ago

The master branch contains a fix that largely solves the memory leak. We will release a new version soon. The last bits may require some changes in ecCodes.

PaoloMarconi95 commented 1 month ago

Still experiencing this issue.

I'll provide a quick snippet to reproduce my issue:

import cfgrib
import gc
import os

class Reader:
    def __init__(self):
        self.data = None

    def read(self, file):
        self.data = cfgrib.open_datasets(file)

    def close_data(self):
        for ds in self.data:
            ds.close()
        self.data = None

if __name__ == "__main__":
    file = "PATH/TO/GRIB/FILE"
    r = Reader()
    r.read(file)
    r.close_data()
    r = None
    gc.collect()

In this case, the operation after r.read(file) have no effect on memory usage at all.