Open era127 opened 5 months ago
Thank you for bringing this up!
We can definitely get this implemented based on the docs you shared.
right now, my thought is to make it's its own macro and support the additional arguments, but I'm open to making it part of left_join as well and use an as_of argument to specify the syntax.
If we do this, we should do it within the joins (not as separate macros). Happy to talk offline about what the syntax should be on our end.
Tidyverse actually supports asof joins so it would be nice to keep this consistent with that syntax, which is incidentally very similar to DuckDB syntax.
Here's the DuckDB syntax: https://duckdb.org/docs/guides/sql_features/asof_join.html
Here's the tidyverse syntax (see inequality joins and closest()
): https://dplyr.tidyverse.org/reference/join_by.html
We don't have to use join_by()
in TidierDB but we should think about how this would work.
Oh wow. I did not know that. I'll take a look at these docs as well and then reach out soon about what might be the best way to add this with tidyverse as our guide, @kdpsingh
Here is an older reference from before duckdb supported that join natively and the query was quite complex.
This is just one potential syntax option. I am happy to explore other styles as well that might be more in line with Tidyverse style joins (changes to underlying code were minimal).
import TidierDB as DB
db = DB.duckdb_open(":memory")
con = DB.duckdb_connect(db)
DB.copy_to(con, "https://duckdb.org/data/prices.csv", "prices")
DB.copy_to(con, "https://duckdb.org/data/holdings.csv", "holdings")
@chain DB.db_table(con, :holdings) begin
DB.@left_join(:prices, ticker, ticker, as_of = true, and_clause = "holdings.when >= prices.when")
DB.@select(holdings.ticker, holdings.when)
DB.@mutate(value = price * shares)
DB.@show_query
# @collect
end
WITH cte_2 AS (
SELECT holdings.ticker, holdings.when, price * shares AS value
FROM holdings ASOF
LEFT JOIN prices ON prices.ticker = holdings.ticker AND holdings.when >= prices.when)
SELECT *
FROM cte_2
i tried using the parenthesis method on the DuckDB website but it would not run locally.
Nice work on this. I don't like this syntax. I'll share an alternative goal syntax.
I also want to remove the need for symbols from the joins. I know it's doable I think I know why you were running into issues.
sounds good, i revisited the duckdb code that I could not get to run yesterday and I wanted to clarify my fixed issue. the reason it did not run was because the column name is when. When i put it made it 'when' , the below code ran. However, the using keyword does not support table.name conventions (my initial approach). Looking forward to your syntax goal.
DuckDB.execute(con, "
SELECT ticker, h.when, price * shares AS value
FROM holdings h
ASOF JOIN prices p USING (ticker, 'when');") |> DataFrame
I was playing around with other possible approaches. here is another, albeit verbose, possible syntax. i also removed the need for symbols in joins. (the quotes are needed to avoid the parsing error for when
in duckdb). This could surely be shifted to something even closer to join_by
and inequality joins in tidyverse.
DB.@chain DB.db_table(con, :holdings) begin
DB.@left_join(prices, as_of = true, using_columns = [ticker, "\"when\""])
DB.@select(holdings.ticker, holdings.when)
DB.@mutate(value = price * shares)
DB.@show_query
# DB.@collect
end
WITH cte_2 AS (
SELECT holdings.ticker, holdings.when, price * shares AS value
FROM holdings ASOF
LEFT JOIN prices USING (ticker, 'when'))
SELECT *
FROM cte_2
Appreciate you looking at this, @drizk1. Sorry have been slow on my end -- will share ideal syntax shortly.
No rush at all. I just like to tinker in my free time
It would be great if the joins supported the ASOF (specifically the left asof) joins as it is supported in duckdb. I know this is not standard sql but it’s quite useful for time series analysis.