actris-cloudnet / cloudnetpy

Python package for Cloudnet data processing
MIT License
39 stars 28 forks source link

Timeout for cloudnet.fmi.fi / utils.fetch_cloudnet_model_types / _find_model_type #85

Closed spirrobe closed 1 year ago

spirrobe commented 1 year ago

Hi

When rerunning our local cloudnetpy I've recently ran into issues related to the cloudnet hub. Namely, a timeout occurs as we run the framework for 3 devices and this likely means too many server requests or rather a too crappy connection from our field site server :-). I could fix this locally by pausing in between but had a look at the call stack.

The categorize step cloudnetpy/categorize/categorize.py calls cloudnetpy/categorize/model.py for making the inputs or rather building the class. This in turns call _find_model_type(model_file) which uses utils.fetch_cloudnet_model_types(). The latter is a simple call to the cloudnet hub api to get the model types, returning 1.4 kB. In the next step, a simple lookup loop runs that simply checks the filename:

def _find_model_type(file_name: str) -> str:
    """Finds model type from the model filename."""
    possible_keys = utils.fetch_cloudnet_model_types()
    for key in possible_keys:
        if key in file_name:
            return key
    raise ValueError("Unknown model type")

where

def fetch_cloudnet_model_types() -> list:
    """Finds different model types."""
    url = "https://cloudnet.fmi.fi/api/models"
    data = requests.get(url=url, timeout=60).json()
    models = [model["id"] for model in data]
    model_types = [model.split("-")[0] for model in models]
    return list(set(model_types))

Ultimately, this approach means an extra load on the cloudnet hub for each categorize call, to in the end get the main model families of 'harmonie', 'icon', 'era5', 'ecmwf', 'gdas1' and adds this to the Model class/instance via L45 in cloudnetpy/categorize/model.py:

self.type = _find_model_type(model_file)

However, this information is not passed on to the netCDF Model file (the source from the model netCDF is but the "modeltype" as in this case is not).

It also means that the pipeline may break when there is a timeout/internet outage/server issue. The only place where I found this information to be relevant was then in cloudnetpy/categorize/melting.py where only gdas1 has other boundaries than the rest of the model

def _find_model_temperature_range(model_type: str) -> tuple[float, float]:
    """Returns temperature range around 0C for given model type."""
    if "gdas1" in model_type.lower():
        return -8, 6
    return -4, 3

Overall, I'd prefer to get rid of this server call unless there is a good reason to keep it. :-) But I can see several ways to do so:

What do you think?

pablosaa commented 1 year ago

I do agree with your suggestion, therefore cloudnetpy could be used locally off-line too.

siiptuo commented 1 year ago

Thanks for raising this issue! It's best to hardcode the model families as these should be more or less static today.