AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

Default setting of min_area_km2 #132

Closed semvijverberg closed 1 year ago

semvijverberg commented 1 year ago

As a new user, it can be unexpected that some regions are removed by default. The min_area_km2 argument is now 3000**2 by default. You can get region 1 and 3 and think, where is region 2?

I also noticed there is a small error in the calculation of the area. I was testing at which area_in_km2 a small region of two grid cells of 5x5 would be removed (demonstrated in tutorial Notebook). A back on the envelop calculation would be that this should roughly occur at the following value:

1 degree at equator is ~111 km, at 47,5 degree north this would be: 111 (km) np.radians(47,5) ~ 92km . So a 5 by 5 grid cell would roughly have an area of (592)**2=211.600 km2.

However, the current calculation of a single grid cell is 1.91e8 km2.

I checked the area calculation here, and the answer should be 208763 km2:

The correct way of calculating the area, following this formula.:

lat = data.latitude
dlat = np.abs(data.latitude.values[1] - data.latitude.values[0])
dlon = np.abs(data.longitude.values[1] - data.longitude.values[0])

lat_s = np.radians(data.latitude - dlat/2)
lat_n = np.radians(data.latitude + dlat/2)

spherical_area = (np.pi/180) * RADIUS_EARTH_KM**2 * np.abs(np.sin(lat_s) - np.sin(lat_n)) * np.abs(dlon)

Which renders and area of 208763 km2 (exactly matching the calculation of the tool above).

Hence, this issue highlights two things.

Could these changes be incorporated in #111 or do we need to raise a new PR?

semvijverberg commented 1 year ago

Ps: please let me know if you have any feedback on the clarity of this issue :)

geek-yang commented 1 year ago

@semvijverberg Thanks for finding these bugs and adding a clear explanation. @BSchilperoort and @jannesvaningen could you please inspect the calculation in our package? I think we shall raise a new PR and add this into the unit tests related to RGDR.

Peter9192 commented 1 year ago

Ps: please let me know if you have any feedback on the clarity of this issue :)

Overall it is clear to me 👍 . Personally, I'd open two different issue for the bug and the suggestion. That keeps the discussion focused. In the description of the bug, it would be very helpful if you could include 1 or 2 lines of code to show how you obtained "the current calculation of a single grid cell is 1.91e8 km2".

BSchilperoort commented 1 year ago

Hi Sem, thanks for noticing this. I found the bug in the calculation. It's on the following line: https://github.com/AI4S2S/s2spy/blob/40475a6d7a36280d262ce5a8c055b85a351ea266/s2spy/rgdr/rgdr.py#L41

This should instead be:

    spherical_area = h * dlon / np.pi / 4 

This will then give the correct answer.

For example, the 5x5 degree gridcell centered at 47.5 N:

>>> spherical_area(lat=45, dlat=5, dlon=5)

218506

There can be slight differences depending on how you approximate the earth to be spherical. The (corrected) code calculates what the relative area of a square gridcell on a unit sphere would be, and scales this to the actual surface area of earth.


Could these changes be incorporated in https://github.com/AI4S2S/s2spy/pull/111 or do we need to raise a new PR?

I will open a new PR to fix the accuracy of the area calculation.

The default setting can be fixed in a separate PR, as we will have to change some of the notebooks/tests for this. The default value was mostly there to ease testing and development of the RGDR module.