Closed michignolo closed 3 months ago
Could you provide:
I encountered a similar warning, but the tracks did not seem to have anything wrong.
Note from the docstring of from_tracks()
that by default centroids that are "far from coast" are ignored, which lead TC that are nowhere near the coast to consist of zero filled arrays:
ignore_distance_to_coast : boolean, optional
If True, centroids far from coast are not ignored.
If False, the centroids' distances to the coast are calculated with the
`Centroids.get_dist_coast()` method (unless there is "dist_coast" column in
the centroids' GeoDataFrame) and centroids far from coast are ignored.
Default: False.
Version of climada is 4.1.0 with python 3.11 (conda on ubuntu 2204)
Minimal working code
from climada.hazard import TCTracks
import time
from pathos.pools import ProcessPool as Pool
import logging
import sys
def syn_track(tr_cyclon_id, ncpus = 5, nsynth = 100):
pool = Pool(ncpus = ncpus) # start a pathos pool
time.sleep(1)
try:
tr_c = TCTracks.from_ibtracs_netcdf( storm_id=tr_cyclon_id)
except Exception as e:
logger.error("Error on %s : %s ", tr_cyclon_id, str(e))
return
tr_c.equal_timestep(pool = pool)
t0 = time.time()
try:
ts = tr_c.calc_perturbed_trajectories(nb_synth_tracks=nsynth, pool
= pool)
except Exception as e:
logger.error("Error on %s : %s ", tr_cyclon_id, str(e))
pool.close()
pool.join()
return
t1 = time.time()
logger_info.info("Elapsed time on %s : %s ", tr_cyclon_id, t1 - t0)
return tr_C
empty_track = "1994087S10115"
syn_track(tr_cyclon_id)
The missing (all empty variable) is radius_max_wind while the sustained wind is not empty.
PS. for my scopes it is already ok. But if we could solve the numba warning, likely the code would be even faster.
Thanks
On Tue, Aug 6, 2024 at 2:28 PM Samuel Juhel @.***> wrote:
Could you provide:
- a full minimal example
- the version of climada you are using
I encountered a similar warning, but the tracks did not seem to have anything wrong.
Note from the docstring of from_tracks() that by default centroids that are "far from coast" are ignored, which lead TC that are nowhere near the coast to consist of zero filled arrays:
ignore_distance_to_coast : boolean, optional If True, centroids far from coast are not ignored. If False, the centroids' distances to the coast are calculated with the `Centroids.get_dist_coast()` method (unless there is "dist_coast" column in the centroids' GeoDataFrame) and centroids far from coast are ignored. Default: False.
— Reply to this email directly, view it on GitHub https://github.com/CLIMADA-project/climada_python/issues/932#issuecomment-2271172099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEGM6XOLNY7OR3U2MLBR3ZQC6PBAVCNFSM6AAAAABMANGHXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRGE3TEMBZHE . You are receiving this because you authored the thread.Message ID: @.***>
Ok so I checked the IBTracks dataset itself for this specific track and there are no records of radius_max_wind (and only very few of the sustained wind)
So the missing data is not related to the parallel processing at all, just to the fact that it does not exist.
Default behaviour is to fill in the gaps by interpolating, but that only if at least some data exists.
You can use estimate_missing = True
in from_ibtracs_netcdf()
to estimate missing physical variables from the ones that are available. But beware this uses coarse assumptions, from the documentation:
estimate_missing : bool, optional For each fixed time step, estimate missing pressure, wind speed and radius using other variables that are available at that time step. The relationships between the variables are purely statistical. In comparison to
interpolate_missing
, this procedure is able to estimate values for variables that haven't been reported by any agency at any time step, as long as other variables are available. A typical example are storms before 1950, for which there are often no reported values for pressure, but for wind speed. In this case, a rough statistical pressure-wind relationship is applied to estimate the missing pressure values from the available wind-speed values. Make sure to setrescale_windspeeds=True
when using this option because the statistical relationships are calibrated using rescaled wind speeds.
Concerning the warning, I no longer have it when using estimate_missing = True
so I guess it solve your problem, but there is indeed something to look into otherwise.
Ok fantastic.
Thanks a lot for having investigated the issue
Michelangelo
Il mer 7 ago 2024, 12:10 Samuel Juhel @.***> ha scritto:
Ok so I checked the IBTracks dataset itself for this specific track and there are no records of radius_max_wind (and only very few of the sustained wind)
So the missing data is not related to the parallel processing at all, just to the fact that it does not exist. Default behaviour is to fill in the gaps by interpolating, but that only if at least some data exists. You can use estimate_missing = True in from_ibtracs_netcdf() to estimate missing physical variables from the ones that are available. But beware this uses coarse assumptions, from the documentation:
estimate_missing : bool, optional For each fixed time step, estimate missing pressure, wind speed and radius using other variables that are available at that time step. The relationships between the variables are purely statistical. In comparison to interpolate_missing, this procedure is able to estimate values for variables that haven't been reported by any agency at any time step, as long as other variables are available. A typical example are storms before 1950, for which there are often no reported values for pressure, but for wind speed. In this case, a rough statistical pressure-wind relationship is applied to estimate the missing pressure values from the available wind-speed values. Make sure to set rescale_windspeeds=True when using this option because the statistical relationships are calibrated using rescaled wind speeds.
Concerning the warning, I no longer have it when using estimate_missing = True so I guess it solve your problem, but there is indeed something to look into otherwise.
— Reply to this email directly, view it on GitHub https://github.com/CLIMADA-project/climada_python/issues/932#issuecomment-2273111944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMEGM4PG7BM2QDGRH7PRSTZQHXCBAVCNFSM6AAAAABMANGHXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZTGEYTCOJUGQ . You are receiving this because you authored the thread.Message ID: @.***>
When downloading several TC tracks (before year 2000) the numba compilation warns of the definition of time_step with error:
This in turn ends in the simulation of perturbed trajectories with a zero filled array of max_sustained_wind and similar variables.
The bug appears while downloading tracks mostly old ones (but also several new).
for instance
downloading the track_id = 2000364N07130 and calling the perturbed trajectories
I get a empty perturbed list after having called the function equal_timestep
Important the code for track generation is running in parallel with pool. Is this a possible error source ?