Interpolate TC track timesteps to hourly when calculating footprints?

ChrisFairless commented 2 years ago

Not many users know about the TCTracks.equal_timestep method for adding extra, interpolated points to TC tracks.

The reasons for doing this are:

The resulting wind field is much smoother and more natural. Larger times between track points, and therefore larger spaces between wind field calculations, lead to a very uneven footprint, with high winds only present around the eye's location at the reporting times in the track data. For example the eye of this storm at landfall is located right over Panama City, Florida, resulting in a falsely low hazard values at landfall:

(thanks to Evelyn for this plot)
Our impact functions were (I assume) calibrated against these interpolated tracks. Since adding interpolated frames to a track only increases the hazard (which is the maximum across all frames), uninterpolated tracks will have a low bias to the hazard and therefore the impacts.

Following a conversation on Slack we thought it was worth automatically running this method as part of TropCyclone.from_tracks to make sure that points on storm tracks are no more than one hour (half an hour?) apart.

We'd do this by adding a max_timestep = 1 or max_timestep = 0.5 parameter to the method, so that the user can disable it if they don't want to smooth the track (e.g. when you want the plot to look like a series of snapshots).

Note: this will increase the computational cost of hazard calculations by about a factor of at three (as there will be three times as many timesteps).

tovogt commented 2 years ago

Good idea, I totally support this!

Note: this will increase the computational cost of hazard calculations by about a factor of at three (as there will be three times as many timesteps).

This is only true when compared to the situation where users deliberately choose to compute the wind field for tracks at 3 hour resolution. However, I would say that this is really rare. Almost all use cases I can think of require a higher resolution. So, if somebody currently runs the wind field computation at 3 hour resolution, it's most probably by mistake anyway.

Now about the implementation: I'm not sure that it would be a good idea to have TropCyclone.from_tracks call equal_timestep because that would silently change the input data in place. Instead, I would suggest that the interpolation functionality is added at the very beginning of the function compute_windfields. A temporary copy of the track is created and this copy is interpolated to the desired resolution. By default, the temporary copy is removed afterwards. As an option, the user may choose to retain the copy, replacing the data of the TCTracks object.

Now about the default value of max_timestep: I used 1 hour resolution so far because all others seem to use it. Do you have any references or considerations that are in favor of a higher resolution?

bguillod commented 2 years ago

I also totally support the idea, and like @tovogt proposition for the implementation. However, in my experience, 1 hour is not high enough. We modeled wind on a 5km grid and went for 5 minutes. This might be overkill, but I’d strongly recommend someone does a comparison on a fine grid such as 5km and plots the wind field difference using a few temporal resolutions between 3 hours and 5 minutes, so there is a basis to the default interpolation.

tovogt commented 2 years ago

What do you mean when you say "not high enough"? What is the criterion? Can we define something like the acceptable error in wind speed? Let's say, we take 1-minute resolution as the "ground truth". On a 1 km grid, how large is the average (or maximum) deviation of wind speeds in each grid cell from that benchmark when choosing a lower temporal resolution than 1 minute? How much deviation would be acceptable? 0.1 m/s? 1 m/s? 5 m/s? Or a relative deviation of 1% / 5% / 10%?

I think it's a very bad solution to just produce some plots at increasing resolutions and then choosing the one where our gut feeling tells us that it looks visually acceptable.

I would say that the "acceptable" error very much depends on your application. We clearly won't find a default value that will be acceptable for all purposes. Most probably, we won't even find a default value that is acceptable for "most" applications. But there are things that we can do:

We can choose a "common" value (like the 1 hour that is used in all CLIMADA tutorials and in many CLIMADA-based studies, e.g. for damage functions) as the default and then add a note in the docstring where we explain that it might be important to choose a different value, depending on the application.
Alternatively, we can make this parameter a positional argument so that users are forced to make a choice.

bguillod commented 2 years ago

What do you mean when you say "not high enough"? What is the criterion? Can we define something like the acceptable error in wind speed? Let's say, we take 1-minute resolution as the "ground truth". On a 1 km grid, how large is the average (or maximum) deviation of wind speeds in each grid cell from that benchmark when choosing a lower temporal resolution than 1 minute? How much deviation would be acceptable? 0.1 m/s? 1 m/s? 5 m/s? Or a relative deviation of 1% / 5% / 10%?

I think it's a very bad solution to just produce some plots at increasing resolutions and then choosing the one where our gut feeling tells us that it looks visually acceptable.

Yes, sure, this is also the way I would do it to have a robust default value. I just started with a simple proposition, since I prefer that to some randomly chosen default value (which definitely would be worst than the first 'very bad solution' I suggested), but I'm more than happy to have such a more robust approach if anyone has time to do the analysis. On the other hand, I obviously cannot say what the acceptable deviation should be. I suggest we see the results first, and then find a compromise between accuracy and performance.

I would say that the "acceptable" error very much depends on your application. We clearly won't find a default value that will be acceptable for all purposes. Most probably, we won't even find a default value that is acceptable for "most" applications.

No question. It should also not be the aim to find a default that suits all.

But there are things that we can do:

We can choose a "common" value (like the 1 hour that is used in all CLIMADA tutorials and in many CLIMADA-based studies, e.g. for damage functions) as the default and then add a note in the docstring where we explain that it might be important to choose a different value, depending on the application.

Alternatively, we can make this parameter a positional argument so that users are forced to make a choice.

I think the first option more, as long as this default value is based on the analysis you propose above.

ChrisFairless commented 2 years ago

I agree with all the above. And Thomas's implentation sounds good.

The default timestep should balance computation costs with differences from 'ground truth'. When this is implemented it's worth testing on a couple of fast-moving, strong storms as Benoit suggests. My instinct also says that 1 hr is too coarse, so let's play around with it.

If anyone wants to make this happen I'll be super happy. I don't have loads of time in the next couple of months :)

CLIMADA-project / climada_python

Interpolate TC track timesteps to hourly when calculating footprints? #362