Closed NiklasPhabian closed 1 year ago
What are columns 'x' and 'y'? I assume they are a collection/cell (i.e., CCL) for the event?
They are the x and y coordinates of the IMERG grid.
What is the difference between the 'sids' and 'cover' columns in the IMERG Event-17 DF?
Oi. This is a bit tricky. The 'sids' column hold an array containing sids of every IMERG cell for that belongs to this timestamp/event row. The array has the same length as cell_areas
, precips
, x
, y
.
The cover
contains the dissolved sids. Dissolving means that 4 sids sharing the same ancestor get replaced with the ancestor. This reduces the number of sids and thus makes intersects tests way faster. So you really want to run your intersects tests on the cover
column.
xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))
This is totally fine, but there is also a stare_intersects() function of a dataframe, doing the same thing
not just the precip from features touching the RIO
Careful with terminology. 'touching' would mean that they don't have overlap. Currently, we really only can do intersects, which includes overlap and touching.
I assume the reason that the event surface area is more than ~20x the ROI-area itself is the integration of event-area over 58 time-samples (i.e., roi_xcal_sdf['tot_area_in_roi'].sum()).
That sounds right.
I am missing a bit of your code here. We need to put this into a notebook and look at this together.
overall, take a look here: https://github.com/SpatioTemporal/featureDB/blob/main/analyze.ipynb
Thank you! I'll take a close look at the notebook.
xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))
This is totally fine, but there is also a stare_intersects() function of a dataframe, doing the same thing
Hmm, not sure about this last bit; doesn't stare_intersects() produce a boolean of the intersection of the two objects, which in this care there are 122 rows,
>>>> xcal_event_sdf.shape = (122, 11)
xcal_event_intersects_roi = xcal_event_sdf.stare_intersects(roi_sids)
>>>> type(xcal_event_intersects_roi) = <class 'pandas.core.series.Series'>
>>>> xcal_event_intersects_roi.shape = (122,) == len(rows in xcal_event_sdf)
Whereas, the following give a point by point intersection for each row/object.
# Getting the XCAL Event SIDS which intersect the ROI
# Store intersect status in new 'in' column
xcal_event_roi_sdf['in'] = xcal_event_roi_sdf['sids'].apply(lambda row: pystare.intersects(roi_sids, row))
>>>> type(xcal_event_roi_sdf['in'] = <bound method Series>
xcal_event_roi_sdf['in'].shape = (58,) == len(rows intersecting )
xcal_event_roi_sdf['in'].iloc[0].shape = (4161,) == len(column 'x')
Perhaps, I just misunderstood your comment. Either way, thank you for the clarifications. And congratulations on your defense. One could say it is the end of a long road, but I prefer to welcome you to the beginning of an amazing journey.
Mike
Background
XCAL Event-17
Based on STARECookbook example "999-H0-00-IMERG-Analyze-1.py" with
Columns 'x', 'y', ('cell_areas', 'precips', 'sids') are the same dimensionally:
Column 'cover' differs a bit dimensionality:
Columns 'tot_area', 'tot_precip' and 'trixels' have a single entry for each of 122 rows.
Questions
I think I correctly calculate the whole event statistics.
Q1:
What are columns 'x' and 'y'? I assume they are a collection/cell (i.e., CCL) for the event?
For example, Index 2543, timestamp 2021-01-24 20:30:00 has two cells [573 574]?
These correspond to 'sids' [3433966733257179305 3433961230857396137]. Thus, two locations at the same time.
The corresponding 'cover' [3433959531497914377 3433966128567681033] has the same dimensionality (in this case), but different SIDs that the 'sids' column.
Q2:
What is the difference between the 'sids' and 'cover' columns in the IMERG Event-17 DF?
I see that they both have SIDs, but the number of SIDs corresponding to each cell sometimes differs (with 'cover' always having the same or fewer SIDs).
Which should I use for spatial intersection?
XCAL Event-17 ROI Intersection
Only 58 intersecting rows of original 122
Touches
Here I use the intersection and the 'in' column. Am I correct that this does not limit 'sids' or 'precip_in_roi' to the ROI, but rather is those value for any part of the time (in time) that touches the ROI to some degree?
Question
Here is where I'm less sure; what if I want only the precip that fall inside the ROI, not just the precip from features touching the RIO?
I assume the reason that the event surface area is more than ~20x the ROI-area itself is the integration of event-area over 58 time-samples (i.e., roi_xcal_sdf['tot_area_in_roi'].sum()). xcal_event_over_roi_total_area_m2 = 2.40e+12 m^2 ROI Surface Area = 1.11e+11 m^2
I see if I loop over each row the cumulative number of SIDs (n_read_sids) and that if I filtered to a set of unique SIDs (unique_sids) the difference in indeed large. n_unique_sids = 21221 n_read_sids = 194696 (~9x n_unique_sids)
So, I hope this all shows that I'm doing things correctly.