CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
321 stars 125 forks source link

possible bug or problem on climada.hazard.tc_tracks_synth. calc_perturbed_trajectories when parallel processing is active #932

Closed michignolo closed 3 months ago

michignolo commented 3 months ago

When downloading several TC tracks (before year 2000) the numba compilation warns of the definition of time_step with error:

/home/mick/anaconda3/envs/climada_env/lib/python3.11/site-packages/climada/hazard/tc_tracks.py:1508: NumbaWarning: 
**Compilation is falling back to object mode WITHOUT looplifting enabled because Function "_one_interp_data" failed type inference due to: non-precise type pyobjec**t
During: typing of argument at /home/mick/anaconda3/envs/climada_env/lib/python3.11/site-packages/climada/hazard/tc_tracks.py (1542)

File "../../home/mick/anaconda3/envs/climada_env/lib/python3.11/site-packages/climada/hazard/tc_tracks.py", line 1542:
    def _one_interp_data(track, time_step_h, land_geom=None):

            time_step = pd.tseries.frequencies.to_offset(pd.Timedelta(hours=time_step_h)).freqstr

This in turn ends in the simulation of perturbed trajectories with a zero filled array of max_sustained_wind and similar variables.

The bug appears while downloading tracks mostly old ones (but also several new).

for instance

downloading the track_id = 2000364N07130 and calling the perturbed trajectories

I get a empty perturbed list after having called the function equal_timestep

Important the code for track generation is running in parallel with pool. Is this a possible error source ?

spjuhel commented 3 months ago

Could you provide:

I encountered a similar warning, but the tracks did not seem to have anything wrong.

Note from the docstring of from_tracks() that by default centroids that are "far from coast" are ignored, which lead TC that are nowhere near the coast to consist of zero filled arrays:

    ignore_distance_to_coast : boolean, optional
        If True, centroids far from coast are not ignored.
        If False, the centroids' distances to the coast are calculated with the
        `Centroids.get_dist_coast()` method (unless there is "dist_coast" column in
        the centroids' GeoDataFrame) and centroids far from coast are ignored.
        Default: False.
michignolo commented 3 months ago

Version of climada is 4.1.0 with python 3.11 (conda on ubuntu 2204)

Minimal working code

from climada.hazard import TCTracks
import time
from pathos.pools import ProcessPool as Pool
import logging
import sys

def syn_track(tr_cyclon_id, ncpus = 5, nsynth = 100):
    pool = Pool(ncpus = ncpus) # start a pathos pool
    time.sleep(1)
    try:

        tr_c = TCTracks.from_ibtracs_netcdf( storm_id=tr_cyclon_id)
    except Exception as e:

        logger.error("Error on %s : %s ", tr_cyclon_id, str(e))
        return

    tr_c.equal_timestep(pool = pool)

    t0 = time.time()

    try:
        ts = tr_c.calc_perturbed_trajectories(nb_synth_tracks=nsynth, pool
= pool)

    except Exception as e:
        logger.error("Error on %s : %s ", tr_cyclon_id, str(e))
        pool.close()
        pool.join()
        return

    t1 = time.time()
    logger_info.info("Elapsed time on %s : %s ", tr_cyclon_id, t1 - t0)

    return tr_C

empty_track = "1994087S10115"
syn_track(tr_cyclon_id)

The missing (all empty variable) is radius_max_wind while the sustained wind is not empty.

PS. for my scopes it is already ok. But if we could solve the numba warning, likely the code would be even faster.

Thanks

On Tue, Aug 6, 2024 at 2:28 PM Samuel Juhel @.***> wrote:

Could you provide:

  • a full minimal example
  • the version of climada you are using

I encountered a similar warning, but the tracks did not seem to have anything wrong.

Note from the docstring of from_tracks() that by default centroids that are "far from coast" are ignored, which lead TC that are nowhere near the coast to consist of zero filled arrays:

ignore_distance_to_coast : boolean, optional
    If True, centroids far from coast are not ignored.
    If False, the centroids' distances to the coast are calculated with the
    `Centroids.get_dist_coast()` method (unless there is "dist_coast" column in
    the centroids' GeoDataFrame) and centroids far from coast are ignored.
    Default: False.

— Reply to this email directly, view it on GitHub https://github.com/CLIMADA-project/climada_python/issues/932#issuecomment-2271172099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEGM6XOLNY7OR3U2MLBR3ZQC6PBAVCNFSM6AAAAABMANGHXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRGE3TEMBZHE . You are receiving this because you authored the thread.Message ID: @.***>

spjuhel commented 3 months ago

Ok so I checked the IBTracks dataset itself for this specific track and there are no records of radius_max_wind (and only very few of the sustained wind)

So the missing data is not related to the parallel processing at all, just to the fact that it does not exist. Default behaviour is to fill in the gaps by interpolating, but that only if at least some data exists. You can use estimate_missing = True in from_ibtracs_netcdf() to estimate missing physical variables from the ones that are available. But beware this uses coarse assumptions, from the documentation:

estimate_missing : bool, optional For each fixed time step, estimate missing pressure, wind speed and radius using other variables that are available at that time step. The relationships between the variables are purely statistical. In comparison to interpolate_missing, this procedure is able to estimate values for variables that haven't been reported by any agency at any time step, as long as other variables are available. A typical example are storms before 1950, for which there are often no reported values for pressure, but for wind speed. In this case, a rough statistical pressure-wind relationship is applied to estimate the missing pressure values from the available wind-speed values. Make sure to set rescale_windspeeds=True when using this option because the statistical relationships are calibrated using rescaled wind speeds.

Concerning the warning, I no longer have it when using estimate_missing = True so I guess it solve your problem, but there is indeed something to look into otherwise.

michignolo commented 3 months ago

Ok fantastic.

Thanks a lot for having investigated the issue

Michelangelo

Il mer 7 ago 2024, 12:10 Samuel Juhel @.***> ha scritto:

Ok so I checked the IBTracks dataset itself for this specific track and there are no records of radius_max_wind (and only very few of the sustained wind)

So the missing data is not related to the parallel processing at all, just to the fact that it does not exist. Default behaviour is to fill in the gaps by interpolating, but that only if at least some data exists. You can use estimate_missing = True in from_ibtracs_netcdf() to estimate missing physical variables from the ones that are available. But beware this uses coarse assumptions, from the documentation:

estimate_missing : bool, optional For each fixed time step, estimate missing pressure, wind speed and radius using other variables that are available at that time step. The relationships between the variables are purely statistical. In comparison to interpolate_missing, this procedure is able to estimate values for variables that haven't been reported by any agency at any time step, as long as other variables are available. A typical example are storms before 1950, for which there are often no reported values for pressure, but for wind speed. In this case, a rough statistical pressure-wind relationship is applied to estimate the missing pressure values from the available wind-speed values. Make sure to set rescale_windspeeds=True when using this option because the statistical relationships are calibrated using rescaled wind speeds.

Concerning the warning, I no longer have it when using estimate_missing = True so I guess it solve your problem, but there is indeed something to look into otherwise.

— Reply to this email directly, view it on GitHub https://github.com/CLIMADA-project/climada_python/issues/932#issuecomment-2273111944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEGM4PG7BM2QDGRH7PRSTZQHXCBAVCNFSM6AAAAABMANGHXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZTGEYTCOJUGQ . You are receiving this because you authored the thread.Message ID: @.***>