Closed JaerongA closed 5 months ago
Thanks for figuring out the root cause, should we consider casting the previous_block_start
and chunk_end
into nanosecond (fill with 000)? Using between
is much slower as far as I know (but not that big a deal on the grand scheme of things).
A bigger issue is, do we need to switch to using this between
strategy everywhere?
It's also kind of unexpected that pandas doesn't handle mismatch in the timestamp precision. I was afraid that this is due to the non-monotonic index
Would this work?
block_df = fetch_stream(block_query).sort_index()
block_df = block_df[previous_block_start:chunk_end]
@ttngu207 I think you're right. I was just focused on the key error so I thought that was the cause, but turns out it was just because of timestamps not being sorted after explode
. I'll apply your suggestion.
This is to address the following KeyError:
Acceesing/slicing dataframe with timestamp index yielded a KeyError due to the mismatch in timestamp precision (the timestamp in the dataframe was in nanosecond precision whereas
previous_block_start
was in microsecond precision. This fix will usebetween
method instead of looking for the exact timestamp index value.