SpatioTemporal / STAREPandas

STAREpandas adds SpatioTemporal Adaptive Resolution Encoding (STARE) support to pandas DataFrames. https://starepandas.readthedocs.io/en/latest/
MIT License
4 stars 1 forks source link

how to do many-to-many operations on STAREPandas DFs #151

Open mbauer288 opened 1 year ago

mbauer288 commented 1 year ago

STAREPandas DFs restricted to the same timestamp (i.e., contemporaneous via temporal intersection).

From DFs with all timestamps:

imerg_ts = np.sort(imerg_sdf['timestamp'].unique())
mcms_ts = np.sort(mcms_sdf['timestamp'].unique())
# Merged, unique datetimes
merged_ts = np.union1d(imerg_ts, mcms_ts)    

##
# Sort by TimeStamp
imerg_sdf_by_ts = imerg_sdf.sort_values(by=["timestamp"])
mcms_sdf_by_ts = mcms_sdf.sort_values(by=["timestamp"])

for aidx, a_time in enumerate(merged_ts):
   ##
   # MCMS subset with just this DTime.
   mcms_sdf_now = mcms_sdf_by_ts[mcms_sdf_by_ts.timestamp == a_time]
   mcms_sdf_now.reset_index(inplace=True, drop=True)        

   ##
   # IMERG subset with just this DTime.
   imerg_sdf_now = imerg_sdf_by_ts[imerg_sdf_by_ts.timestamp == a_time]
   imerg_sdf_now.reset_index(inplace=True, drop=True)

This give something like this for each a_time:

 imerg_sdf_now   
        label timestamp   itivs                x y cell_areas tot_area precips tot_precip  sids cover trixels
     0  87    2021-01-10  2275465702582262897  ...                                                        ...
     1  91    2021-01-10  2275465702582262897  ...                                                        ...

 mcms_sdf_now  
        usi                   uci                   timestamp  tivs30               lon lat cslp ctype cinten tinten depth sarea  sa_fill vert_poly_geo verts sids cover trixels
     0  20210109150539835085  20210110000540035625 2021-01-10  2275465702582262897  ...                                                                                      ...
     1  20210109030280028312  20210110000255028312 2021-01-10  2275465702582262897  ...                                                                                      ...
     2  20210109150515029437  20210110000500029687 2021-01-10  2275465702582262897  ...                                                                                      ...
     3  20210109150530500800  20210110000525001062 2021-01-10  2275465702582262897  ...                                                                                      ...
     4  20210108030470001937  20210110000500002687 2021-01-10  2275465702582262897  ...                                                                                      ...
     5  20210107180425030375  20210110000370031000 2021-01-10  2275465702582262897  ...                                                                                      ...
     6  20210106150557234913  20210110000495000125 2021-01-10  2275465702582262897  ...                                                                                      ...
     7  20210109210595025312  20210110000605025437 2021-01-10  2275465702582262897  ...                                                                                      ...

The problem

Property differences:

The spatial relationship between contemporaneous ETC centers and IMERG features is thus Many-to-Many:

Example solution using placeholder data.

mcms_data = {'uci': ["uci-a", "uci-b"], 'vert_poly_geo': ["poly-a", "poly-b"], 'sids': ["sids-a", "sids-b"], 'cover': ["cover-a", "cover-b"], 'trixels': ["trixels-a", "trixels-b"]}
mcms_sdf_now = pandas.DataFrame.from_dict(mcms_data)

imerg_data = {"label": [87, 91], "sids": ["sids-87", "sids-91"], "cover": ["cover-87", "cover-91"], "trixels": ["trixels-87", "trixels-91"]}
imerg_sdf_now = pandas.DataFrame.from_dict(imerg_data)

# Merge so info about each IMERG feature is available for each ETC center (uci)
combined = mcms_sdf_now.merge(imerg_sdf_now, how='cross', suffixes=('_mcms', '_imerg'))
     uci vert_poly_geo    sids    cover    trixels
0  uci-a        poly-a  sids-a  cover-a  trixels-a
1  uci-b        poly-b  sids-b  cover-b  trixels-b

   label     sids     cover     trixels
0     87  sids-87  cover-87  trixels-87
1     91  sids-91  cover-91  trixels-91

     uci vert_poly_geo sids_mcms cover_mcms trixels_mcms  label sids_imerg cover_imerg trixels_imerg
0  uci-a        poly-a    sids-a    cover-a    trixels-a     87    sids-87    cover-87    trixels-87
1  uci-a        poly-a    sids-a    cover-a    trixels-a     91    sids-91    cover-91    trixels-91
2  uci-b        poly-b    sids-b    cover-b    trixels-b     87    sids-87    cover-87    trixels-87
3  uci-b        poly-b    sids-b    cover-b    trixels-b     91    sids-91    cover-91    trixels-91

Now I can check the combined DF for spatial intersection between the columns "sids_mcms" and "sids_imerg" for each row, storing the intersecting SIDs (if any) in a new column "sids_st". I guess I could also make a "cover_st" and "trixels_st" column based on "sids_st" as well.

Then I can plot the ETC trixels, the full IMERG trixels or the space-time intersecting (st) IMERG trixels as required.

I know how to brute force this using loops and starepandas [stare_intersection(), to_trixels() and stare_dissolve()], but is there a simple DF set of operations to so this last part?