SpatioTemporal / STAREPandas

STAREpandas adds SpatioTemporal Adaptive Resolution Encoding (STARE) support to pandas DataFrames. https://starepandas.readthedocs.io/en/latest/
MIT License
4 stars 1 forks source link

Potential issue forming cover/trixels from a polygon #152

Closed mbauer288 closed 1 year ago

mbauer288 commented 1 year ago

Possible issue with forming covers/trixels with a polygon (specifically, HMS-GASP smoke polygons).

Below is the basic code for reading such a file (a zipped shapefile) into a DF and then making SIDs, cover and trixels.

shp_file_gdf = gpd.read_file(sfile)
shp_file_sdf = spd.STAREDataFrame(shp_file_gdf)
shp_file_sids = shp_file_sdf.make_sids(10, n_partitions=1
shp_file_sdf.set_sids(shp_file_sids, inplace=True))

shp_file_sdf['index_column'] = shp_file_sdf.index
cover_sdf = shp_file_sdf.stare_dissolve(by='index_column', num_workers=1)
cover = cover_sdf["sids"].to_numpy()
shp_file_sdf['cover'] = cover
shp_file_sdf.drop('index_column', inplace=True, axis=1)

trixels = shp_file_sdf.make_trixels(wrap_lon=True, n_partitions=1, num_workers=1)
shp_file_sdf.set_trixels(trixels, inplace=True)
shp_file_sdf.rename(columns={'geometry': 'poly'}, inplace=True)

shp_file_sdf.to_pickle(pfile)

This leads to a STAREPandas DF with the follow row:

    Satellite                                            GOES-EAST
    Start                                             2021040 2150
    End                                               2021040 2150
    Density                                                  Light
    sids         [3409787867872886789 3398247393827749894 3398388131316105222]
    poly         POLYGON ((-95.280389 22.402781, -95.265837 23.537831, -95.105766 25.385924, -94.902039 26.477317, -94.669209 27.263121, -93.868853 28.078028, -91.758826 28.485481, -90.740192 28.485481, -89.590591 28.689208, -88.397334 28.892935, -88.004432 29.067558, -87.596979 29.809706, -87.553323 30.188055, -87.713394 30.420886, -88.266367 30.44999, -89.023066 30.391782, -89.648798 30.37723, -90.507361 30.362678, -91.977105 30.231711, -93.490504 30.100744, -94.634564 29.734036, -95.898182 29.173752, -97.042591 28.315445, -97.912819 27.004143, -98.121435 26.020667, -98.246605 24.560353, -98.407537 23.535153, -98.336012 22.575519, -97.829372 20.596644, -97.141311 19.439006, -97.290323 19.254232, -96.765802 17.781997, -96.187637 17.478014, -95.776365 16.524339, -96.151874 16.166712, -97.010181 16.250158, -101.56398 17.925049, -102.30307 17.501856, -103.26047 15.98915, -103.28143 14.722551, -102.81577 12.999604, -100.74357 10.193995, -96.683408 9.8908344, -93.700447 10.531599, -89.893666 12.859906, -89.649194 13.407058, -91.954217 14.862249, -94.003127 16.503705, -93.828504 16.99265, -93.351201 17.295329, -91.616613 17.341896, -91.885543 18.034806, -93.428046 18.849713, -94.548544 19.591861, -94.941445 20.450424, -95.280389 22.402781))
    cover        [3398247393827749894 3398388131316105222 3409787867872886789]
    trixels      MULTIPOLYGON (((-102.0061261489767 12.55845166731699, -98.16967315951149 13.096208737107288, -100.26113111522477 16.3979598138818, -102.0061261489767 12.55845166731699)), ((-96.53941578566531 24.02744279156111, -97.50556930753118 22.127736382916943, -95.4413181434046 22.42557821322879, -96.53941578566531 24.02744279156111)), ((-95.4413181434046 22.42557821322879, -97.50556930753118 22.127736382916943, -96.41110951189182 20.531393245035204, -95.4413181434046 22.42557821322879)))

The following plot show the result, in which the trixels make 3 large triangles, but they hardly cover the polygon. The polygon seems fine (e.g., closed, CCW etc).

(https://github.com/SpatioTemporal/STAREPandas/files/11900430/smoke_poly_17.pdf) smoke_poly_17

Any Ideas? Perhaps something with the locally convex and concave nature of the polygon?

NiklasPhabian commented 1 year ago

@mbauer288, do you have the link to the actual file you are reading?

mbauer288 commented 1 year ago

Sure do.

Here is how I read it.

# Standard Imports
import os

# Third-Party Imports
import shapefile
from shapely.geometry import Polygon
import shapely.wkt
from shapely.geometry import shape

##
# Converts longitude to +/1 180
LON_180 = lambda x: ((x + 180.0) % 360.0) - 180.0

##
# Converts longitude to 0-360
LON_360 = lambda x: ((x + 360.0) % 360.0)

sfile = "hms_smoke20210209.zip"

##
# Read shapefile
sf = shapefile.Reader(sfile)

fields = sf.fields
fieldnames = [f[0] for f in sf.fields[1:]]
shapes = sf.shapes()
skip_idx = []
for fidx, feature in enumerate(shapes):
    # Limit to the polygon at issue
    if fidx != 17:
        continue

    geo = feature.__geo_interface__  # (GeoJSON format)
    spoly = Polygon([tuple(l) for l in geo['coordinates'][0]])
    # The polygon at issue
    #   >>>> Possible issue encountered when converting Shape #17 to GeoJSON: Shapefile format requires that polygons contain at least one exterior ring, but the Shape was entirely made up of interior holes (defined by counter-clockwise orientation in the shapefile format). The rings were still included but were encoded as GeoJSON exterior rings instead of holes.
    #   >>>> 017: spoly = <POLYGON ((-95.28 22.403, -94.941 20.45, -94.549 19.592, -93.428 18.85, -91....>
    print(f"\t{fidx:03d}: {spoly = }")

    ##
    # Check if Polygon CCW (interpreted as a hole, so reverse)
    print(f"\tIs CCW = {spoly.exterior.is_ccw}")
    if spoly.exterior.is_ccw:
        spoly = shapely.geometry.polygon.orient(spoly, sign=-1.0)
    print(f"\tIs CCW = {spoly.exterior.is_ccw}")

    tmp = list(zip(*spoly.exterior.coords.xy))
    spoly_lons = [LON_360(_[0]) for _ in tmp]
    spoly_lats = [_[1] for _ in tmp]

    print(f"\tspoly_lons ({len(spoly_lons)}): [{', '.join(f'{v_:+.5f}[{x_:02d}]' for x_, v_ in enumerate(spoly_lons))}]")
    print(f"\tspoly_lats ({len(spoly_lats)}): [{', '.join(f'{v_:+.5f}[{x_:02d}]' for x_, v_ in enumerate(spoly_lats))}]")
    print(f"\tspoly BBox ({min(spoly_lons)}, {min(spoly_lats)}, {max(spoly_lons)}, {max(spoly_lats)}) or ({LON_180(min(spoly_lons))}, {min(spoly_lats)}, {LON_180(max(spoly_lons))}, {max(spoly_lats)})")
    print(f"{'Lons':>10s}{'Lats':>10s}")
    for idx_ in tmp:
        print(f"{idx_[0]:10.5f} {idx_[1]:9.5f}")

    """
    017: spoly = <POLYGON ((-95.28 22.403, -95.266 23.538, -95.106 25.386, -94.902 26.477, -9...>

    Is CCW = True
    Is CCW = False

    spoly_lons (56): [+264.71961[00], +265.05856[01], +265.45146[02], +266.57195[03], +268.11446[04], +268.38339[05], +266.64880[06], +266.17150[07], +265.99687[08], +268.04578[09], +270.35081[10], +270.10633[11], +266.29955[12], +263.31659[13], +259.25643[14], +257.18423[15], +256.71857[16], +256.73953[17], +257.69693[18], +258.43602[19], +262.98982[20], +263.84813[21], +264.22364[22], +263.81236[23], +263.23420[24], +262.70968[25], +262.85869[26], +262.17063[27], +261.66399[28], +261.59246[29], +261.75340[30], +261.87856[31], +262.08718[32], +262.95741[33], +264.10182[34], +265.36544[35], +266.50950[36], +268.02290[37], +269.49264[38], +270.35120[39], +270.97693[40], +271.73363[41], +272.28661[42], +272.44668[43], +272.40302[44], +271.99557[45], +271.60267[46], +270.40941[47], +269.25981[48], +268.24117[49], +266.13115[50], +265.33079[51], +265.09796[52], +264.89423[53], +264.73416[54], +264.71961[55]]
    spoly_lats (56): [+22.40278[00], +20.45042[01], +19.59186[02], +18.84971[03], +18.03481[04], +17.34190[05], +17.29533[06], +16.99265[07], +16.50371[08], +14.86225[09], +13.40706[10], +12.85991[11], +10.53160[12], +9.89083[13], +10.19399[14], +12.99960[15], +14.72255[16], +15.98915[17], +17.50186[18], +17.92505[19], +16.25016[20], +16.16671[21], +16.52434[22], +17.47801[23], +17.78200[24], +19.25423[25], +19.43901[26], +20.59664[27], +22.57552[28], +23.53515[29], +24.56035[30], +26.02067[31], +27.00414[32], +28.31545[33], +29.17375[34], +29.73404[35], +30.10074[36], +30.23171[37], +30.36268[38], +30.37723[39], +30.39178[40], +30.44999[41], +30.42089[42], +30.18805[43], +29.80971[44], +29.06756[45], +28.89294[46], +28.68921[47], +28.48548[48], +28.48548[49], +28.07803[50], +27.26312[51], +26.47732[52], +25.38592[53], +23.53783[54], +22.40278[55]]

    spoly BBox (256.71857, 9.8908344, 272.446677, 30.44999) or (-103.28143, 9.8908344, -87.55332299999998, 30.44999)

          Lons      Lats
     -95.28039  22.40278
     -94.94145  20.45042
     -94.54854  19.59186
     -93.42805  18.84971
     -91.88554  18.03481
     -91.61661  17.34190
     -93.35120  17.29533
     -93.82850  16.99265
     -94.00313  16.50371
     -91.95422  14.86225
     -89.64919  13.40706
     -89.89367  12.85991
     -93.70045  10.53160
     -96.68341   9.89083
    -100.74357  10.19399
    -102.81577  12.99960
    -103.28143  14.72255
    -103.26047  15.98915
    -102.30307  17.50186
    -101.56398  17.92505
     -97.01018  16.25016
     -96.15187  16.16671
     -95.77636  16.52434
     -96.18764  17.47801
     -96.76580  17.78200
     -97.29032  19.25423
     -97.14131  19.43901
     -97.82937  20.59664
     -98.33601  22.57552
     -98.40754  23.53515
     -98.24661  24.56035
     -98.12144  26.02067
     -97.91282  27.00414
     -97.04259  28.31545
     -95.89818  29.17375
     -94.63456  29.73404
     -93.49050  30.10074
     -91.97710  30.23171
     -90.50736  30.36268
     -89.64880  30.37723
     -89.02307  30.39178
     -88.26637  30.44999
     -87.71339  30.42089
     -87.55332  30.18805
     -87.59698  29.80971
     -88.00443  29.06756
     -88.39733  28.89294
     -89.59059  28.68921
     -90.74019  28.48548
     -91.75883  28.48548
     -93.86885  28.07803
     -94.66921  27.26312
     -94.90204  26.47732
     -95.10577  25.38592
     -95.26584  23.53783
     -95.28039  22.40278
    """

hms_smoke20210209.zip

NiklasPhabian commented 1 year ago

Hey @mbauer288,

I cannot reproduce the issue just yet. But also noted some problems:

In the code you posted:

  1. the first paragraph reads the shapefile
  2. In the second paragraph, we dissolve by the index; i.e. we dissolve each row with itself. Generally speaking, this step may in some edge-cases be useful, however most of the time it isn't. The idea behind dissolving is that we merge multiple rows/polygons/stare covers subject to a common attribute and then reduce the stare representation by eliminating superfluous trixels (i.e. duplicates or merging 4 children into parent). Since we have just created the SIDs and do not do any merging, the dissolve step here should yield exactly the same SIDs as its input.
  3. In the last paragraph, you create trixel polygons. Those trixel polygons are going to be created from the 'sids' column (produced in the first paragraph), not from the 'cover' column.

Maybe something could have gone wrong during geoJSON conversion? If I do something like the following, I cannot confirm the observation that "the Shape was entirely made up of interior holes"

import geopandas as pgd

sfile = 'hms_smoke20210209.zip'
sdf = gpd.read_file(sfile)
print(sdf.iloc[17].geometry.exterior)
NiklasPhabian commented 1 year ago

to follow up: could you post the full code that produces the figure with the 3 trixels and the polygon?

mbauer288 commented 1 year ago

Argh, after an update to Shapely, GeoPandas etc. and the problem has disappeared. I assume this is/was a issue with the shapely 2.0 change when reading a legacy shapely dataset.