SpatioTemporal / STAREPandas

STAREpandas adds SpatioTemporal Adaptive Resolution Encoding (STARE) support to pandas DataFrames. https://starepandas.readthedocs.io/en/latest/
MIT License
4 stars 1 forks source link

MB question #124

Closed NiklasPhabian closed 1 year ago

NiklasPhabian commented 1 year ago

Background

XCAL Event-17

Based on STARECookbook example "999-H0-00-IMERG-Analyze-1.py" with

features == == pickles/featuredb.pickle
intersects = features.stare_intersects(roi_sids)
features[intersects].label.unique()
xcal_event_sdf == event_17 = features[features.label==17]

 type(xcal_event_sdf) = <class 'starepandas.staredataframe.STAREDataFrame'>
+-----+---------+---------------------+---------------+-----------------+---------------------------+------------+-------------------+--------------+-----------------------------------------------+-----------------------------------------------+-------------------------------------------------------------+
|     |   label | timestamp           |             x |               y |                cell_areas |   tot_area |           precips |   tot_precip |                                          sids |                                         cover |                                                     trixels |                                                                                                                                                                                                                                                                                                                            |
+-----+---------+---------------------+---------------+-----------------+---------------------------+------------+-------------------+--------------+-----------------------------------------------+-----------------------------------------------+-------------------------------------------------------------+
|   0 |      17 | 2021-01-24 20:30:00 |     [573 574] |       [846 846] |     [1.041e+08 1.042e+08] |  2.083e+08 |     [1.245 1.237] |       129320 |     [3433966733257179305 3433961230857396137] |     [3433959531497914377 3433966128567681033] | MULTIPOLYGON (((-95.3812972675803 32.68791448501074, ...))) |
.                                                                                                                                                                                                                                                                                                                  .
.                                                                                                                                                                                                                                                                                                                  .
.                                                                                                                                                                                                                                                                                                                  .
| 121 |      17 | 2021-01-27 09:00:00 | [592 592 ...] | [1227 1228 ...] | [1.062e+08 1.062e+08 ...] | 12.765e+08 | [1.058 1.044 ...] |       874589 | [2460426819372382985 2460426089384640041 ...] | [2460381567520866313 2460383766544121865 ...] | MULTIPOLYGON (((-56.9078791935255 30.61816398173722, ...))) |
+-----+---------+---------------------+---------------+-----------------+---------------------------+------------+-------------------+--------------+-----------------------------------------------+-----------------------------------------------+-------------------------------------------------------------+
[122 rows x 11 columns]

Row Indices (122):
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
     21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
     41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
     61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
     81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
     101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
     121]

Original Row Indices (122):
    [2543, 2544, 2545, 2546, 2547, 2548, 2549, 2550, 2551, 2552, 2553, 2554, 2555, 2556, 2557, 2558, 2559,
     2560, 2561, 2562, 2563, 2564, 2565, 2566, 2567, 2568, 2569, 2570, 2571, 2572, 2573, 2574, 2575, 2576,
     2577, 2578, 2579, 2580, 2581, 2582, 2583, 2584, 2585, 2586, 2587, 2588, 2589, 2590, 2591, 2592, 2593,
     2594, 2595, 2596, 2597, 2598, 2599, 2600, 2601, 2602, 2603, 2604, 2605, 2606, 2607, 2608, 2609, 2610,
     2611, 2612, 2613, 2614, 2615, 2616, 2617, 2618, 2619, 2620, 2621, 2622, 2623, 2624, 2625, 2626, 2627,
     2628, 2629, 2630, 2631, 2632, 2633, 2634, 2635, 2636, 2637, 2638, 2639, 2640, 2641, 2642, 2643, 2644,
     2645, 2646, 2647, 2648, 2649, 2650, 2651, 2652, 2653, 2654, 2655, 2656, 2657, 2658, 2659, 2660, 2661,
     2662, 2663, 2664]

Time Stamps (122):
    ['2021/01/24 20:30:00', '2021/01/24 21:00:00', '2021/01/24 21:30:00', '2021/01/24 22:00:00', '2021/01/24 22:30:00', '2021/01/24 23:00:00', '2021/01/24 23:30:00',
     '2021/01/25 00:00:00', '2021/01/25 00:30:00', '2021/01/25 01:00:00', '2021/01/25 01:30:00', '2021/01/25 02:00:00', '2021/01/25 02:30:00', '2021/01/25 03:00:00', '2021/01/25 03:30:00', '2021/01/25 04:00:00', '2021/01/25 04:30:00', '2021/01/25 05:00:00', '2021/01/25 05:30:00', '2021/01/25 06:00:00', '2021/01/25 06:30:00', '2021/01/25 07:00:00', '2021/01/25 07:30:00', '2021/01/25 08:00:00', '2021/01/25 08:30:00', '2021/01/25 09:00:00', '2021/01/25 09:30:00', '2021/01/25 10:00:00', '2021/01/25 10:30:00', '2021/01/25 11:00:00', '2021/01/25 11:30:00', '2021/01/25 12:00:00', '2021/01/25 12:30:00', '2021/01/25 13:00:00', '2021/01/25 13:30:00', '2021/01/25 14:00:00', '2021/01/25 14:30:00', '2021/01/25 15:00:00', '2021/01/25 15:30:00', '2021/01/25 16:00:00', '2021/01/25 16:30:00', '2021/01/25 17:00:00', '2021/01/25 17:30:00', '2021/01/25 18:00:00', '2021/01/25 18:30:00', '2021/01/25 19:00:00', '2021/01/25 19:30:00', '2021/01/25 20:00:00', '2021/01/25 20:30:00', '2021/01/25 21:00:00', '2021/01/25 21:30:00', '2021/01/25 22:00:00', '2021/01/25 22:30:00', '2021/01/25 23:00:00', '2021/01/25 23:30:00',
     '2021/01/26 00:00:00', '2021/01/26 00:30:00', '2021/01/26 01:00:00', '2021/01/26 01:30:00', '2021/01/26 02:00:00', '2021/01/26 02:30:00', '2021/01/26 03:00:00', '2021/01/26 03:30:00', '2021/01/26 04:00:00', '2021/01/26 04:30:00', '2021/01/26 05:00:00', '2021/01/26 05:30:00', '2021/01/26 06:00:00', '2021/01/26 06:30:00', '2021/01/26 07:00:00', '2021/01/26 07:30:00', '2021/01/26 08:00:00', '2021/01/26 08:30:00', '2021/01/26 09:00:00', '2021/01/26 09:30:00', '2021/01/26 10:00:00', '2021/01/26 10:30:00', '2021/01/26 11:00:00', '2021/01/26 11:30:00', '2021/01/26 12:00:00', '2021/01/26 12:30:00', '2021/01/26 13:00:00', '2021/01/26 13:30:00', '2021/01/26 14:00:00', '2021/01/26 14:30:00', '2021/01/26 15:00:00', '2021/01/26 15:30:00', '2021/01/26 16:00:00', '2021/01/26 16:30:00', '2021/01/26 17:00:00', '2021/01/26 17:30:00', '2021/01/26 18:00:00', '2021/01/26 18:30:00', '2021/01/26 19:00:00', '2021/01/26 19:30:00', '2021/01/26 20:00:00', '2021/01/26 20:30:00', '2021/01/26 21:00:00', '2021/01/26 21:30:00', '2021/01/26 22:00:00', '2021/01/26 22:30:00', '2021/01/26 23:00:00', '2021/01/26 23:30:00',
     '2021/01/27 00:00:00', '2021/01/27 00:30:00', '2021/01/27 01:00:00', '2021/01/27 01:30:00', '2021/01/27 02:00:00', '2021/01/27 02:30:00', '2021/01/27 03:00:00', '2021/01/27 03:30:00', '2021/01/27 04:00:00', '2021/01/27 04:30:00', '2021/01/27 05:00:00', '2021/01/27 05:30:00', '2021/01/27 06:00:00', '2021/01/27 06:30:00', '2021/01/27 07:00:00', '2021/01/27 07:30:00', '2021/01/27 08:00:00', '2021/01/27 08:30:00', '2021/01/27 09:00:00']

Start TStamp : Jan 24, 2021 20:30:00 UTC
End   TStamp : Jan 27, 2021 09:00:00 UTC
Duration     : 2d 12h 30m 0s == 60.50h

Columns 'x', 'y', ('cell_areas', 'precips', 'sids') are the same dimensionally:

nitems           = 276183
n_nested_lists   = 122 (same as # of DF rows)
len_nested_lists = [2, 15, 52, 85, 217, 305, 515, 532, 675, 1047, 1363, 1618, 1748, 1862, 2395, 2923, 3816, 3988, 4025, 4023, 3774, 3720,
                    3856, 3699, 3981, 4161, 4140, 4070, 4408, 4704, 4754, 4709, 4292, 4450, 5178, 5093, 5035, 4909, 4316, 4692, 4356, 4402,
                    4213, 3267, 2960, 3108, 2833, 2723, 2624, 2469, 1859, 2318, 2476, 2144, 2431, 2466, 2422, 2432, 2512, 2621, 2625, 3231,
                    3298, 3367, 3532, 3529, 3655, 3518, 3190, 3069, 2556, 2851, 2880, 2817, 2256, 2134, 2367, 2572, 3497, 3444, 3102, 2989,
                    2670, 2182, 1797, 1641, 1652, 1729, 1778, 1554, 1538, 1319, 1207, 1415, 1361, 1378, 1209, 994, 766, 715, 620, 663, 754,
                    731, 745, 635, 507, 521, 341, 262, 290, 272, 267, 191, 61, 32, 21, 22, 25, 27, 17, 12]

Column 'cover' differs a bit dimensionality:

nitems           = 54006
n_nested_lists   = 122
len_nested_lists = [2, 7, 16, 31, 67, 96, 116, 143, 126, 236, 249, 306, 309, 318, 387, 422, 423, 422, 496, 460, 419, 402, 468, 451, 530, 547,
                    531, 612, 580, 666, 615, 662, 796, 901, 901, 898, 909, 907, 893, 803, 764, 759, 766, 626, 559, 505, 466, 475, 463, 476,
                    387, 482, 536, 504, 508, 516, 612, 641, 646, 601, 683, 752, 777, 719, 698, 736, 698, 737, 648, 606, 604, 565, 525, 531,
                    466, 463, 502, 508, 667, 728, 726, 704, 679, 621, 550, 517, 477, 497, 510, 420, 439, 391, 372, 431, 407, 382, 360, 327,
                    278, 263, 225, 217, 245, 251, 266, 227, 182, 202, 145, 112, 123, 119, 114, 78, 29, 23, 12, 13, 14, 12, 9, 9]

Columns 'tot_area', 'tot_precip' and 'trixels' have a single entry for each of 122 rows.

Questions

I think I correctly calculate the whole event statistics.

Overall XCAL Event:
    Based on the Event-17 DF, e.g.,
        xcal_event_total_precip = xcal_event_sdf.tot_precip.sum()
        xcal_event_total_area_m2 = xcal_event_sdf.tot_area.sum()
        ...

    Start                      : Jan 24, 2021 20:30:00 UTC
    End                        : Jan 27, 2021 09:00:00 UTC
    Duration                   : 2d 12h 30m 0s == 60.50h
    ROI Surface Area           : 1.11e+11 m^2
    Surface Area               : 2.77e+13 m^2
        Relative to ROI        : 24967.618%
    Cumulative Precipitation   : 6.20e+10 m^3 or 6.20e+13 liters == 2.24 mm/m^2 or liters/m^2
    Uniform Precipitation Rate : 0.0370 mm/(m^2 hr) or liters/(m^2 hr)

See MCMS_event_Virginia_XCAL_fullstare_VAprj.png

Q1:

What are columns 'x' and 'y'? I assume they are a collection/cell (i.e., CCL) for the event?

For example, Index 2543, timestamp 2021-01-24 20:30:00 has two cells [573 574]?

These correspond to 'sids' [3433966733257179305 3433961230857396137]. Thus, two locations at the same time.

The corresponding 'cover' [3433959531497914377 3433966128567681033] has the same dimensionality (in this case), but different SIDs that the 'sids' column.

Q2:

What is the difference between the 'sids' and 'cover' columns in the IMERG Event-17 DF?

I see that they both have SIDs, but the number of SIDs corresponding to each cell sometimes differs (with 'cover' always having the same or fewer SIDs).

Which should I use for spatial intersection?

XCAL Event-17 ROI Intersection

Basic derivation:
    xcal_event_intersects_roi = xcal_event_sdf.stare_intersects(roi_sids)
    xcal_event_roi_sdf = xcal_event_sdf.reset_index()[xcal_event_intersects_roi]
    xcal_event_roi_sdf.reset_index(inplace=True, drop=True)
    xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))
    xcal_event_roi_sdf['precip_in_roi'] = xcal_event_roi_sdf.apply(lambda row: _calc_tot_precip(row), axis=1)

Only 58 intersecting rows of original 122

 type(xcal_event_roi_sdf) = <class 'starepandas.staredataframe.STAREDataFrame'>
    +----+---------+---------+---------------------+---------------+----------------+----------------------------+------------+-------------------+--------------+---------------------------+---------------------------+-----------------------------------------+-------------------+-----------------+
    |    |   index |   label |           timestamp |             x |              y |                 cell_areas |   tot_area |           precips |   tot_precip |                      sids |                     cover |                                 trixels |                in |   precip_in_roi |
    |----+---------+---------+---------------------+---------------+----------------+----------------------------+------------+-------------------+--------------+---------------------------+---------------------------+-----------------------------------------+-------------------+-----------------+
    |  0 |      25 |      17 | 2021-01-25 09:00:00 | [504 505 ...] |  [840 826 ...] |   [9.533e+07 9.547e+07 ...]| 4.1124e+11 |  [1.74 1.187 ...] |    9.875e+08 | [3333734495306796713 ...] | [3071490130238767112 ...] |  MULTIPOLYGON (((-89.534 36.344, ...))) | [False False ...] |          177905 |
    .                                                                                                                                                                                                                                                                                                    .
    .                                                                                                                                                                                                                                                                                                    .
    .                                                                                                                                                                                                                                                                                                    .
    | 57 |      82 |      17 | 2021-01-26 13:30:00 | [526 526 ...] | [1074 1075 ...] | [9.828e+07 9.828e+07 ...] |  2.729e+11 | [1.011 1.019 ...] |   44.355e+08 | [3166622033877575241 ...] | [2460100092544155657 ...] | MULTIPOLYGON (((-59.3895 31.681, ...))) | [False False ...] |         8030826 |
    +----+---------+---------+---------------------+---------------+-----------------+---------------------------+------------+-------------------+--------------+---------------------------+---------------------------+-----------------------------------------+-------------------+-----------------+
    [58 rows x 14 columns]

Row Indices (58):
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
     26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
     51, 52, 53, 54, 55, 56, 57]

Original Row Indices [still stored as new 'index' column] (58):
    [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
     51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
     76, 77, 78, 79, 80, 81, 82]

Time Stamps (58):
    ['2021/01/25 09:00:00', '2021/01/25 09:30:00', '2021/01/25 10:00:00', '2021/01/25 10:30:00', '2021/01/25 11:00:00', '2021/01/25 11:30:00', '2021/01/25 12:00:00', '2021/01/25 12:30:00', '2021/01/25 13:00:00', '2021/01/25 13:30:00', '2021/01/25 14:00:00', '2021/01/25 14:30:00', '2021/01/25 15:00:00', '2021/01/25 15:30:00', '2021/01/25 16:00:00', '2021/01/25 16:30:00', '2021/01/25 17:00:00', '2021/01/25 17:30:00', '2021/01/25 18:00:00', '2021/01/25 18:30:00', '2021/01/25 19:00:00', '2021/01/25 19:30:00', '2021/01/25 20:00:00', '2021/01/25 20:30:00', '2021/01/25 21:00:00', '2021/01/25 21:30:00', '2021/01/25 22:00:00', '2021/01/25 22:30:00', '2021/01/25 23:00:00', '2021/01/25 23:30:00',
    '2021/01/26 00:00:00', '2021/01/26 00:30:00', '2021/01/26 01:00:00', '2021/01/26 01:30:00', '2021/01/26 02:00:00', '2021/01/26 02:30:00', '2021/01/26 03:00:00', '2021/01/26 03:30:00', '2021/01/26 04:00:00', '2021/01/26 04:30:00', '2021/01/26 05:00:00', '2021/01/26 05:30:00', '2021/01/26 06:00:00', '2021/01/26 06:30:00', '2021/01/26 07:00:00', '2021/01/26 07:30:00', '2021/01/26 08:00:00', '2021/01/26 08:30:00', '2021/01/26 09:00:00', '2021/01/26 09:30:00', '2021/01/26 10:00:00', '2021/01/26 10:30:00', '2021/01/26 11:00:00', '2021/01/26 11:30:00', '2021/01/26 12:00:00', '2021/01/26 12:30:00', '2021/01/26 13:00:00', '2021/01/26 13:30:00']

Start TStamp : Jan 25, 2021 09:00:00 UTC
End   TStamp : Jan 26, 2021 13:30:00 UTC
Duration     : 1d 4h 30m 0s == 28.50h

Columns 'x', 'y', ('cell_areas', 'precips', 'sids', 'in') are the same dimensionally:
    nitems           = 194696
    n_nested_lists   = 58 (same as # of DF rows)
    len_nested_lists = [4161, 4140, 4070, 4408, 4704, 4754, 4709, 4292, 4450, 5178, 5093, 5035, 4909, 4316, 4692, 4356, 4402, 4213, 3267, 2960,
                        3108, 2833, 2723, 2624, 2469, 1859, 2318, 2476, 2144, 2431, 2466, 2422, 2432, 2512, 2621, 2625, 3231, 3298, 3367, 3532,
                        3529, 3655, 3518, 3190, 3069, 2556, 2851, 2880, 2817, 2256, 2134, 2367, 2572, 3497, 3444, 3102, 2989, 2670]

Column 'in' if tested for True values (SIDs that intersect the ROI) reveal the following counts:
n_intersecting_sids   = 24367 v. 194696 in the source
len_intersecting_sids = [3, 19, 38, 93, 133, 177, 202, 226, 304, 347, 343, 359, 359, 319, 373, 404, 503, 470, 387, 421, 606, 664, 671, 698, 774,
                         782, 915, 936, 686, 627, 603, 404, 287, 213, 113, 228, 408, 449, 511, 592, 661, 774, 843, 749, 772, 620, 611, 585, 567,
                         239, 207, 222, 195, 171, 152, 134, 118, 100]

Column 'cover' differs a bit dimensionality:
    nitems           = 37235
    n_nested_lists   = 58
    len_nested_lists = [547, 531, 612, 580, 666, 615, 662, 796, 901, 901, 898, 909, 907, 893, 803, 764, 759, 766, 626, 559, 505, 466, 475, 463,
                        476, 387, 482, 536, 504, 508, 516, 612, 641, 646, 601, 683, 752, 777, 719, 698, 736, 698, 737, 648, 606, 604, 565, 525,
                        531, 466, 463, 502, 508, 667, 728, 726, 704, 679]

Touches

Here I use the intersection and the 'in' column. Am I correct that this does not limit 'sids' or 'precip_in_roi' to the ROI, but rather is those value for any part of the time (in time) that touches the ROI to some degree?

XCAL Event-17 Touches ROI:
    Based on the Event-17 DF intersected with ROI, e.g.

        xcal_event_touches_roi_precip = roi_xcal_sdf.tot_precip.sum()
        xcal_event_touches_roi_total_area_m2 = roi_xcal_sdf.tot_area.sum()
        ...

    These are the whole set of SIDs for any time-sample/row from xcal_event_sdf that has some spatial intersection
    with the ROI. That is, the SIDs are not strictly limited to the ROI.

    Start                        : Jan 25, 2021 09:00:00 UTC
    End                          : Jan 26, 2021 13:30:00 UTC
    Duration                     : 1d 4h 30m 0s == 28.50h
        Share of Event           : 47.11%
    ROI Surface Area             : 1.11e+11 m^2
    Surface Area                 : 1.94e+13 m^2
        Share of ROI             : 17497.854%
        Share of Event           : 70.082%
    Cumulative Precipitation     : 4.45e+10 m^3 or 4.45e+13 liters == 2.30 mm/m^2 or liters/m^2
        Share of Event           : 71.83%
    Uniform Precipitation Rate   : 0.0806 mm/(m^2 hr) or liters/(m^2 hr)
        Difference (ROI - Event) : +0.04 mm/(m^2 hr) or liters/(m^2 hr)
        Relative % Difference    : +117.58%

See MCMS_event_Virginia_XCAL_fullstare_unionROI_touches_VAprj.png

Question

Here is where I'm less sure; what if I want only the precip that fall inside the ROI, not just the precip from features touching the RIO?

XCAL Event-17 Over ROI:
    Based on the same Event-17 DF intersected with ROI, but limited to just within the ROI.
        Rather than 
            xcal_event_touches_roi_precip = roi_xcal_sdf.tot_precip.sum()
            xcal_event_touches_roi_total_area_m2 = roi_xcal_sdf.tot_area.sum()
        use                
            xcal_event_over_roi_precip = roi_xcal_sdf['precip_in_roi'].sum()
            xcal_event_over_roi_total_area_m2 = roi_xcal_sdf['tot_area_in_roi'].sum()

    These are the limited set of SIDs for any time-sample/row from xcal_event_sdf that has some spatial intersection with the ROI, limited to the SIDs that intersect with the ROI. 

    Start                        : Jan 25, 2021 09:00:00 UTC
    End                          : Jan 26, 2021 13:30:00 UTC
    Duration                     : 1d 4h 30m 0s == 28.50h
        Share of Event           : 47.11%
    ROI Surface Area             : 1.11e+11 m^2
    Surface Area                 : 2.40e+12 m^2    
        Share of ROI             : 2167.548%
        Share of Whole Event     : 8.681%
        Share of Touch ROI       : 12.388%
    Cumulative Precipitation     : 4.89e+09 m^3 or 4.89e+12 liters == 2.04 mm/m^2 or liters/m^2
        Share of Whole Event     : 7.89%
        Share of Touch ROI       : 10.99%
    Uniform Precipitation Rate   : 0.0715 mm/(m^2 hr) or liters/(m^2 hr)
        Difference (ROI - Event) : +0.03 mm/(m^2 hr) or liters/(m^2 hr)
        Relative % Difference    : +92.98%

See /MCMS_event_Virginia_XCAL_fullstare_unionROI_unionROI_over_VAprj.png

I assume the reason that the event surface area is more than ~20x the ROI-area itself is the integration of event-area over 58 time-samples (i.e., roi_xcal_sdf['tot_area_in_roi'].sum()). xcal_event_over_roi_total_area_m2 = 2.40e+12 m^2 ROI Surface Area = 1.11e+11 m^2

I see if I loop over each row the cumulative number of SIDs (n_read_sids) and that if I filtered to a set of unique SIDs (unique_sids) the difference in indeed large. n_unique_sids = 21221 n_read_sids = 194696 (~9x n_unique_sids)

So, I hope this all shows that I'm doing things correctly.

NiklasPhabian commented 1 year ago

What are columns 'x' and 'y'? I assume they are a collection/cell (i.e., CCL) for the event?

They are the x and y coordinates of the IMERG grid.

What is the difference between the 'sids' and 'cover' columns in the IMERG Event-17 DF?

Oi. This is a bit tricky. The 'sids' column hold an array containing sids of every IMERG cell for that belongs to this timestamp/event row. The array has the same length as cell_areas, precips, x, y. The cover contains the dissolved sids. Dissolving means that 4 sids sharing the same ancestor get replaced with the ancestor. This reduces the number of sids and thus makes intersects tests way faster. So you really want to run your intersects tests on the cover column.

xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))

This is totally fine, but there is also a stare_intersects() function of a dataframe, doing the same thing

not just the precip from features touching the RIO

Careful with terminology. 'touching' would mean that they don't have overlap. Currently, we really only can do intersects, which includes overlap and touching.

I assume the reason that the event surface area is more than ~20x the ROI-area itself is the integration of event-area over 58 time-samples (i.e., roi_xcal_sdf['tot_area_in_roi'].sum()).

That sounds right.

I am missing a bit of your code here. We need to put this into a notebook and look at this together.

NiklasPhabian commented 1 year ago

overall, take a look here: https://github.com/SpatioTemporal/featureDB/blob/main/analyze.ipynb

mbauer288 commented 1 year ago

Thank you! I'll take a close look at the notebook.

xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))

This is totally fine, but there is also a stare_intersects() function of a dataframe, doing the same thing

Hmm, not sure about this last bit; doesn't stare_intersects() produce a boolean of the intersection of the two objects, which in this care there are 122 rows,

>>>> xcal_event_sdf.shape = (122, 11)

xcal_event_intersects_roi = xcal_event_sdf.stare_intersects(roi_sids)
    >>>> type(xcal_event_intersects_roi) = <class 'pandas.core.series.Series'>
    >>>> xcal_event_intersects_roi.shape = (122,) == len(rows in xcal_event_sdf)

Whereas, the following give a point by point intersection for each row/object.

# Getting the XCAL Event SIDS which intersect the ROI
#   Store intersect status in new 'in' column
xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))

>>>> type(xcal_event_roi_sdf['in'] = <bound method Series>
    xcal_event_roi_sdf['in'].shape = (58,) == len(rows intersecting )
    xcal_event_roi_sdf['in'].iloc[0].shape = (4161,) == len(column 'x')

Perhaps, I just misunderstood your comment. Either way, thank you for the clarifications. And congratulations on your defense. One could say it is the end of a long road, but I prefer to welcome you to the beginning of an amazing journey.

Mike