When rerunning our local cloudnetpy I've recently ran into issues related to the cloudnet hub. Namely, a timeout occurs as we run the framework for 3 devices and this likely means too many server requests or rather a too crappy connection from our field site server :-). I could fix this locally by pausing in between but had a look at the call stack.
The categorize step cloudnetpy/categorize/categorize.py calls cloudnetpy/categorize/model.py for making the inputs or rather building the class. This in turns call _find_model_type(model_file) which uses utils.fetch_cloudnet_model_types(). The latter is a simple call to the cloudnet hub api to get the model types, returning 1.4 kB. In the next step, a simple lookup loop runs that simply checks the filename:
def _find_model_type(file_name: str) -> str:
"""Finds model type from the model filename."""
possible_keys = utils.fetch_cloudnet_model_types()
for key in possible_keys:
if key in file_name:
return key
raise ValueError("Unknown model type")
where
def fetch_cloudnet_model_types() -> list:
"""Finds different model types."""
url = "https://cloudnet.fmi.fi/api/models"
data = requests.get(url=url, timeout=60).json()
models = [model["id"] for model in data]
model_types = [model.split("-")[0] for model in models]
return list(set(model_types))
Ultimately, this approach means an extra load on the cloudnet hub for each categorize call, to in the end get the main model families of 'harmonie', 'icon', 'era5', 'ecmwf', 'gdas1' and adds this to the Model class/instance via L45 in cloudnetpy/categorize/model.py:
self.type = _find_model_type(model_file)
However, this information is not passed on to the netCDF Model file (the source from the model netCDF is but the "modeltype" as in this case is not).
It also means that the pipeline may break when there is a timeout/internet outage/server issue. The only place where I found this information to be relevant was then in cloudnetpy/categorize/melting.py where only gdas1 has other boundaries than the rest of the model
def _find_model_temperature_range(model_type: str) -> tuple[float, float]:
"""Returns temperature range around 0C for given model type."""
if "gdas1" in model_type.lower():
return -8, 6
return -4, 3
Overall, I'd prefer to get rid of this server call unless there is a good reason to keep it. :-)
But I can see several ways to do so:
Hardcode the model families in _find_model_type as there are only 5 options right now; no need to make it more complicated than it has to be, especially given that the type is "only" used for the melting t_range.
Introduce a config file of the json answer from the API call
Require the modeltype as meta input, similar to the calibration for the ceilometer meta dict
Hi
When rerunning our local cloudnetpy I've recently ran into issues related to the cloudnet hub. Namely, a timeout occurs as we run the framework for 3 devices and this likely means too many server requests or rather a too crappy connection from our field site server :-). I could fix this locally by pausing in between but had a look at the call stack.
The categorize step
cloudnetpy/categorize/categorize.py
callscloudnetpy/categorize/model.py
for making the inputs or rather building the class. This in turns call_find_model_type(model_file)
which usesutils.fetch_cloudnet_model_types()
. The latter is a simple call to the cloudnet hub api to get the model types, returning 1.4 kB. In the next step, a simple lookup loop runs that simply checks the filename:where
Ultimately, this approach means an extra load on the cloudnet hub for each categorize call, to in the end get the main model families of
'harmonie', 'icon', 'era5', 'ecmwf', 'gdas1'
and adds this to the Model class/instance via L45 incloudnetpy/categorize/model.py
:self.type = _find_model_type(model_file)
However, this information is not passed on to the netCDF Model file (the source from the model netCDF is but the "modeltype" as in this case is not).
It also means that the pipeline may break when there is a timeout/internet outage/server issue. The only place where I found this information to be relevant was then in
cloudnetpy/categorize/melting.py
where only gdas1 has other boundaries than the rest of the modelOverall, I'd prefer to get rid of this server call unless there is a good reason to keep it. :-) But I can see several ways to do so:
_find_model_type
as there are only 5 options right now; no need to make it more complicated than it has to be, especially given that the type is "only" used for the melting t_range.What do you think?