Closed zaneselvans closed 2 years ago
This will likely be taken care of by virtue of the new harvesting process, which will differentiate between different time frequencies in the column name ('report_year
, report_month
etc.) and the addition of multiple time frequencies to enable merges / FK relationships between tables of different frequencies. See #1247
Our slightly off-label use of
pd.merge_asof()
to merge dataframes of different temporal resolution is slow and has some weird edge cases related to thetolerance
parameter (how far afield should it look to find a "matching" datetime). A simpler, probably faster, option would be to simply explode a specified left and right datetime columns into a specified subset their component parts (year, month, day, etc.) and merge on those as integer keys. This would allow all the different kinds of normal merges to work, and leave it up to the user to choose to do reasonable things...Right now
clean_merge_asof()
works for the year-to-year and year-to-month merges that broadcast in one direction (many-to-one where the many side is the left dataframe), but there are a couple of cases in which we need to do a many to many merge and that has to be done as a one-off right now.This functionality is currently used in the mcoe and eia923 output functions.