I was trying to match dates to intervals, first as a toy example with integers, then with dates. Using integer ranges instead of date intervals requires me to make use of list-columns in the first tibble and the joined dataframe contains no rows. When using actual dates (i.e., without list-columns), everything works as expected and the joined dataframe is not empty.
Is it correct that the list-column is causing this behavior (or is my code flawed)? If so, it would be great if either this functionality could be enabled or if a warning would be shown, I think.
library(tidyverse)
library(lubridate, warn.conflicts = FALSE)
library(fuzzyjoin)
# set up with list column
df_int <- tibble(
video = c("A", "B", "B"),
promo_interval = list(1:4, 3:5, 7:9)
)
df_dates <- tibble(
video = c("A", "A", "A", "B"),
views = c(234, 235, 166, 435),
date = c(2, 3, 7, 7)
)
# join yields empty dataframe
fuzzy_inner_join(
df_dates,
df_int,
by = c("video", "date" = "promo_interval"),
match_fun = list(`==`, `%in%`)
)
#> # A tibble: 0 × 5
#> # … with 5 variables: video.x <chr>, views <dbl>, date <dbl>, video.y <chr>,
#> # promo_interval <list>
# with dates/intervals (no list-column)
df_int <- tibble(
video = c("A", "B", "B"),
promo_interval = c(interval("20220501", "20220504"),
interval("20220503", "20220505"),
interval("20220507", "20220509"))
)
df_dates <- tibble(
video = c("A", "A", "A", "B"),
views = c(234, 235, 166, 435),
date = c(ymd("20220502"),
ymd("20220503"),
ymd("20220507"),
ymd("20220507"))
)
# join results as expected
fuzzy_inner_join(
df_dates,
df_int,
by = c("video", "date" = "promo_interval"),
match_fun = list(`==`, `%within%`)
)
#> # A tibble: 3 × 5
#> video.x views date video.y promo_interval
#> <chr> <dbl> <date> <chr> <Interval>
#> 1 A 234 2022-05-02 A 2022-05-01 UTC--2022-05-04 UTC
#> 2 A 235 2022-05-03 A 2022-05-01 UTC--2022-05-04 UTC
#> 3 B 435 2022-05-07 B 2022-05-07 UTC--2022-05-09 UTC
I was trying to match dates to intervals, first as a toy example with integers, then with dates. Using integer ranges instead of date intervals requires me to make use of list-columns in the first tibble and the joined dataframe contains no rows. When using actual dates (i.e., without list-columns), everything works as expected and the joined dataframe is not empty.
Is it correct that the list-column is causing this behavior (or is my code flawed)? If so, it would be great if either this functionality could be enabled or if a warning would be shown, I think.
Created on 2022-05-28 by the reprex package (v2.0.1)