Open mbauer288 opened 1 year ago
This is likely because there is a finite, irregular resolution with which intervals are covered with STARE tids. The forward and reverse resolutions each point to a bit in the representation. When a tid is constructed from a triple (t0,t1,t2), t1 is used to set the temporal location bits. Then the reverse resolution is set so that a decrement associated with that resolution's bit position is less than t0 (for the case where a cover is desired). Similarly, the forward resolution is set so that an increment at the associated bit position is greater than t1. Therefore, the actual lower and upper bounds that one can calculate from the above tid (a single 64-bit integer) are generally not t0 and t2. Generally, we've set the tid to cover the time interval of the granule, which leads to "overestimates" in joins or search queries.
One can check this by "round-tripping" from a temporal triple to a tid/tiv and back again.
This also means that when doing searches, the results using only the tid will be approximate, and there will need to be another step or pass for a more accurate comparison.
Part of this is due to putting three temporal instants into a single 64-bit integer. (If I'm gauging your issue correctly.)
We could have a different name schema, which includes more precise interval endpoint info... If warranted, but you start to move away from the tiv-cover idea.
You probably want to take a look at write_pods_granule in staredataframe.py. It assumes ts_start and ts_end columns in the granule data from from which it calculates a tid cover for the chunk name. This is aligned with the way starepandas searches for chunks.
Output of test.py
Oddities with these particular IMERG files
Model simulated dataset that has been interpolated to the IMERG grids (0.1x0.1). Provided by Jiun-Dar Chern on Discover.
From today's zoom meeting I see that unlike IMERG data these are instantaneous values rather than spanning an time interval.
Well, my general question still stands regarding how to encode IMERG files in terms of time. For example, should each time sample overlap? If not how and by how much to I force a separation in terms of STARE time covers? As I have done it below, they are in principle separated by 1 ms. However, they still report as intersecting. So I must be doing something wrong there.
Mike
IMERG Time Standards
For IMERG a single day consists of 48 half-open intervals centered on the hour or half-hour:
['00:00', '00:30', ... '23:00', '23:30']
The half-open intervals are such that the Lower Bound (LB), Center (CR) and Upper Bound (UB) of the interval are written as
(LB CR UB]
, or equally,LB < CR <= UB
For example (using format
HH:MM:SS.ms
):Reference: Integrated Multi-satellitE Retrievals for GPM (IMERG) Technical Documentation
In practice, I use a 1 millisecond forward-offset to ensure that the LB is half-open or
(CR-15m+1ms, CR, CR+15m]
For example,
Thus, IMERG intervals defined this way are non-overlapping. Except, when I use
pystare.temporal_overlap
to test this on the TIV covers I find that they do overlap.Usage Summary:
Basically, I
numpy datetime int64
..pystare.from_utc_variable()
to get a TIV triplet for the interval.pystare.from_temporal_triple()
to this to get a TIV cover for the interval.hex()
orpystare.hex16()
(seemingly give the same answer) as the<tcover>
part of thechunk_name
:I also see
pystare.format_tpod()
,pystare.make_tpod_tuple()
, so should I be using these or similar instead?Does this seem correct?
Example based three consecutive files for the PMOD discover data directory:
Screen dump
Example code