CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
291 stars 115 forks source link

util.coordinates.get_land_geometry can't handle the antimeridian unless the extent is (-180, 180) #860

Open ChrisFairless opened 3 months ago

ChrisFairless commented 3 months ago

I've run into an issue with invalid geometries from Natural Earth when antimeridians are involved.

While this works fine:

import climada.util.coordinates as u_coord

u_coord.get_land_geometry(extent = (-180, 180, -90, 90))

all of the following fail:

u_coord.get_land_geometry(extent = (0, 360, -90, 90))
u_coord.get_land_geometry(extent = (-360, 0, -90, 90))
u_coord.get_land_geometry(extent = (-179.99, 180.01, -90, 90))
u_coord.get_land_geometry(extent = (355, 365, -90, 90))

with ValueError: No Shapely geometry can be created from null value

I haven't tested exhaustively but I think it fails when

The error occurs at line 687:

    geom = geom.geometry.unary_union

The country geometries in geom are invalid. Running tests on them explains this:

from shapely.validation import explain_validity

extent = (0, 360, -90, 90)
geom = u_coord.get_country_geometries(extent = extent)
geom['isvalid'] = [g.is_valid for g in geom.geometry]
geom['explanation'] = [explain_validity(g) for g in geom.geometry]

print(geom.loc[~geom['isvalid'], ['SOVEREIGNT', 'isvalid', 'explanation']])

giving

         SOVEREIGNT  isvalid                                        explanation
21           France    False  Self-intersection[359.66897839894 49.320066455...
66            Spain    False  Self-intersection[352.72686094429 38.624422971...
78   United Kingdom    False  Self-intersection[356.900186613464 53.57008278...
97             Mali    False  Self-intersection[353.776510761588 21.85799045...
121         Algeria    False  Self-intersection[359.765069992554 35.81331337...
149    Burkina Faso    False  Self-intersection[359.694191513363 11.11413807...
150            Togo    False  Self-intersection[359.86660798613 11.098570785...
151           Ghana    False  Self-intersection[359.86660798613 11.098570785...
175      Antarctica    False  Self-intersection[359.049391620862 -71.6219411...

i.e. every country that crosses the 0 degrees line.

I haven't investigated in enough detail to figure out exactly meridian causes problems based in the input extent.

The issue comes from lines 846-850 in climada.util.coordinates.get_country_geometries. It's dealing with the case when the requested countries crossed the antimeridian. For some reason it needs to reset the CRS, and this invalidates the geometries.

Fixes that didn't work:

Failure 1: force to the (-180, 180) range

I thought about adjusting the longitude extent to always be in the (-180, 180) range. But since the method is called in a lot of places and the extent parameter is often generated automatically (e.g. during TC Track generation), it's possible that sometimes the returned geometries' longitudes will be offset by 360 degrees from the source data that the extent was calculated from.

Failure 2: fix the geometries

I tried making the geometries valid after the CRS changed (line 850), but it didn't work:

   out.geometry = [shapely.validation.make_valid(g) for g in out.geometry]
image

Fixes that work but aren't good

Option 1: nuclear

Throw an error if the extent longitudes aren't in the (-180, 180) range. Then slowly update any other parts of CLIMADA that fall over because of this.

That's as far as I've got!

ChrisFairless commented 3 months ago

@tovogt you understand globes, if you have time I'd appreciate any insight! N.B: It's not urgent (for me)

tovogt commented 3 months ago

All polygons in Natural Earth are contained in [-180, 180] longitude. If your extent is not contained within that, the code needs to split the extent up and intersect with two separate bounding boxes, and then translate the lon-coordinates to the lon-range consistent with your original extent. Here is how I would fix it:

def get_country_geometries(country_names=None, extent=None, resolution=10):
    """Natural Earth country boundaries within given extent

    If no arguments are given, simply returns the whole natural earth dataset.

    Take heed: we assume WGS84 as the CRS unless the Natural Earth download utility from cartopy
    starts including the projection information. (They are saving a whopping 147 bytes by omitting
    it.) Same goes for UTF.

    If extent is provided, longitude values in 'geom' will all lie within 'extent' longitude
    range. Therefore setting extent to e.g. [160, 200, -20, 20] will provide longitude values
    between 160 and 200 degrees.

    Parameters
    ----------
    country_names : list, optional
        list with ISO 3166 alpha-3 codes of countries, e.g ['ZWE', 'GBR', 'VNM', 'UZB']
    extent : tuple, optional
        (min_lon, max_lon, min_lat, max_lat)
        Extent, assumed to be in the same CRS as the natural earth data.
    resolution : float, optional
        10, 50 or 110. Resolution in m. Default: 10m

    Returns
    -------
    geom : GeoDataFrame
        Natural Earth multipolygons of the specified countries, resp. the countries that lie
        within the specified extent.
    """
    resolution = nat_earth_resolution(resolution)
    shp_file = shapereader.natural_earth(resolution=resolution,
                                         category='cultural',
                                         name='admin_0_countries')
    nat_earth = gpd.read_file(shp_file, encoding='UTF-8')

    if not nat_earth.crs:
        nat_earth.crs = NE_CRS

    # fill gaps in nat_earth
    gap_mask = (nat_earth['ISO_A3'] == '-99')
    nat_earth.loc[gap_mask, 'ISO_A3'] = nat_earth.loc[gap_mask, 'ADM0_A3']

    gap_mask = (nat_earth['ISO_N3'] == '-99')
    for idx, country in nat_earth[gap_mask].iterrows():
        nat_earth.loc[idx, "ISO_N3"] = f"{natearth_country_to_int(country):03d}"

    out = nat_earth
    if country_names:
        if isinstance(country_names, str):
            country_names = [country_names]
        country_mask = np.isin(
            nat_earth[['ISO_A3', 'WB_A3', 'ADM0_A3']].values,
            country_names,
        ).any(axis=1)
        out = out[country_mask]

    if extent:
        if extent[1] - extent[0] > 360:
            raise ValueError(
                f"longitude extent range is greater than 360: {extent[0]} to {extent[1]}"
            )

        if extent[1] < extent[0]:
            raise ValueError(
                f"longitude extent at the left ({extent[0]}) is larger "
                f"than longitude extent at the right ({extent[1]})"
            )

        # translate the lon range to start in [-180, 180] (but it might exceed to the right)
        lon_left = extent[0]
        if lon_left != -180:
            [lon_left] = lon_normalize(np.array(extent[:1]))
        lon_right = lon_left + extent[1] - extent[0]

        # split the extent box unless longitude extent is contained within [-180, +180]
        extents = [(lon_left, lon_right, extent[2], extent[3])]
        if lon_left < -180 or lon_right > 180:
            extents = [
                (lon_left, 180, extent[2], extent[3]),
                (-180, lon_right, extent[2], extent[3]),
            ]
        bboxes = [box(*toggle_extent_bounds(e)) for e in extents]
        intersects = []
        for bbox in bboxes:
            bbox = gpd.GeoSeries(bbox, crs=DEF_CRS)
            bbox = gpd.GeoDataFrame({'geometry': bbox}, crs=DEF_CRS)
            intersects.append(gpd.overlay(out, bbox, how="intersection"))

        # take union of the parts if necessary
        if len(intersects) == 1:
            [out] = intersects
        else:
            out_left, out_right = intersects
            out = gpd.overlay(out_left, out_right[["geometry"]], how="union")

        # rewrap so that all lon coordinates actually lie within specified extent
        if extent[0] < -180 or extent[1] > 180:
            lon_mid = 0.5 * (extent[0] + extent[1])
            # reset the CRS attribute after rewrapping (we don't really change the CRS)
            out = (
                out
                .to_crs({"proj": "longlat", "lon_wrap": lon_mid})
                .set_crs(DEF_CRS, allow_override=True)
            )

    return out

Note however that this might still produce invalid polygons in edge cases. Fixing those would probably require further testing...

peanutfun commented 2 weeks ago

@ChrisFairless How to continue here? The fix by @tovogt seems rather complicated. Maybe it would suffice to update the documentation of get_land_geometry, and even throw an error if the extents exceed the supported range?