alexdesousa / ayesql

Library for using raw SQL in Elixir
MIT License
138 stars 14 forks source link

Duckdb Runner help #36

Closed bhougland18 closed 1 month ago

bhougland18 commented 1 month ago

Hello,

I want to start by expressing my gratitude for developing this library; I'm eager to begin using it! However, I could use some assistance in creating a Runner for DuckDB, specifically with the DuckDBex library. In DuckDB, you can utilize a connection once you open it, and you can use the connection function like this:

{:ok, db} = Duckdbex.open(db_name)
{:ok, conn} = Duckdbex.connection(db)

As a newcomer to Elixir, I did my best to create the appropriate runner file, but I believe the "conn" in the Duckdbex.query function is incorrect. I think I should use "pool", but I'm unsure how that reference is set (an if it can be done with how duckdb works). Any assistance you can provide would be greatly appreciated. I believe an analytical database like DuckDB could really benefit from this type of library.

defmodule MyApp.Runner.Duckdb do
  use AyeSQL.Runner

  @impl AyeSQL
  def run(%AyeSQL.Query{statement: stmt, arguments: args}, options) do
    query_options = Keyword.drop(options, [:pool, :into])
    stmt = transform_stmt(stmt)

    with {:ok, ref} <- Duckdbex.query(conn, stmt) do
      columns = Duckdbex.columns(ref)
      rows = Duckdbex.fetch_all(ref)
      result = %{columns: columns, rows: rows}
      result = AyeSQL.Runner.handle_result(result)
      {:ok, result}
    end
  end

  @spec transform_stmt(AyeSQL.Query.statement()) :: AyeSQL.Query.statement()
  defp transform_stmt(stmt) do
    Regex.replace(~r/\$(\d+)/, stmt, "?")
  end
end

Thanks!

alexdesousa commented 1 month ago

@bhougland18 Thanks for your nice words about this project! :heart: Do you have a minimal project already setup where I can test this? This way I can pull it and test it locally :grin:

bhougland18 commented 1 month ago

@alexdesousa

Hey! I just added the repo for DuckCaller here. Here's how you can start testing it:

Compile the project. Run the following:

{:ok, conn} = DuckCaller.create!("TestDB.duckdb")
DuckCaller.IO.from_excel!(conn, "./data/test_data.xlsx")

Now you'll have a DuckDB database with tables populated from a sample Excel file. You can connect to the DuckDB using a compatible SQL editor, or use the DuckCaller.query!() function (which I plan to replace with your library). You can also display results in the terminal using DuckCaller.Print.to_term, but keep it to a few columns (5-8 max) for best results on smaller screens.

Here's a sample query you can try out:

r = DuckCaller.query!(conn, "Select * From Buyer;")

Let me know if you run into any issues!

I apologize for the sloppy code, I am new to Elixir and have only read half of the book "Joy of Elixir". Enter at your own risk, LOL!

alexdesousa commented 1 month ago

@bhougland18 Hello there! I had some time to test this Yesterday and I managed to have a DuckDB runner 😁 You were very close to the solution though!

TL;DR

The following runner should work:

defmodule AyeSQL.Runner.DuckDB do
  use AyeSQL.Runner

  @impl AyeSQL.Runner
  def run(%AyeSQL.Query{statement: stmt, arguments: args}, options) do
    conn = options[:conn] || raise ArgumentError, message: "Connection `:conn` cannot be `nil`"

    with {:ok, res} <- Duckdbex.query(conn, stmt, args) do
      columns = Duckdbex.columns(res)
      rows = Duckdbex.fetch_all(res)
      {:ok, AyeSQL.Runner.handle_result(%{columns: columns, rows: rows}, options)}
    end
  end
end

Full Solution

I decided to test it using CSV files, as it was the simplest way to have a minimal example. I created the file duck.exs with the following contents:

#!/usr/bin/env elixir

Mix.install([
  {:duckdbex, "~> 0.3"},
  {:ayesql, "~> 1.1"}
])

################################################################################
# Prepare test data

defmodule Database do
  @url "https://raw.githubusercontent.com/duckdb-in-action/examples/refs/heads/main/ch05/atp/atp_players.csv"

  def open do
    with {:ok, db} <- Duckdbex.open(),
         {:ok, conn} <- Duckdbex.connection(db),
         {:ok, _res} <- Duckdbex.query(conn, "INSTALL httpfs"),
         {:ok, _res} <- Duckdbex.query(conn, "LOAD httpfs"),
         {:ok, _res} <- Duckdbex.query(conn, "CREATE OR REPLACE TABLE atp AS FROM '#{@url}'") do
      {:ok, {db, conn}}
    end
  end
end

################################################################################
# AyeSQL runner

defmodule AyeSQL.Runner.DuckDB do
  use AyeSQL.Runner

  @impl AyeSQL.Runner
  def run(%AyeSQL.Query{statement: stmt, arguments: args}, options) do
    conn = options[:conn] || raise ArgumentError, message: "Connection `:conn` cannot be `nil`"

    with {:ok, res} <- Duckdbex.query(conn, stmt, args) do
      columns = Duckdbex.columns(res)
      rows = Duckdbex.fetch_all(res)
      {:ok, AyeSQL.Runner.handle_result(%{columns: columns, rows: rows}, options)}
    end
  end
end

################################################################################
# Setup query

defmodule ATP do
  def by_last_name(last_name) do
    %AyeSQL.Query{
      statement: "SELECT * FROM atp WHERE name_last = $1",
      arguments: [last_name]
    }
  end
end

################################################################################
# Run query

{:ok, {_db, conn}} = Database.open()
query = ATP.by_last_name("Federer")
{:ok, result} = AyeSQL.Runner.DuckDB.run(query, conn: conn, into: %{})

IO.inspect(result)

The script:

  1. Opens a connection and prepares the database by downloading a CSV file with the ATP players,
  2. Prepares the query by hand (not using the AyeSQL parser, though it should be equivalent to any AyeSQL query) to look for a player by last name, in this case Federer.
  3. Runs the query using the database connection.

The script should have the following result:

$ ./duck.exs
[
  %{
    player_id: 103819,
    name_first: "Roger",
    name_last: "Federer",
    hand: "R",
    dob: "19810808",
    ioc: "SUI",
    height: 185,
    wikidata_id: "Q1426"
  }
]

Note: You can run it with elixir duck.exs or changing the running permissions with chmod +x duck.exs and then ./duck.exs.

Conclusion

This could be a nice addition to a future version of AyeSQL. I never used DuckDB before, but I see how useful it can be 😁

I hope this helps!

alexdesousa commented 1 month ago

@bhougland18 In the end, it seemed like a good idea to add this runner, as using SQL in CSV, XLS, etc. files looks like something we would normally do with raw SQL :grin: The newest version now has a Duckdbex runner :rocket:

bhougland18 commented 1 month ago

Awesome!, I was just testing out the updates your provided.

bhougland18 commented 4 weeks ago

Hi @alexdesousa,

I'm running into some issues, likely due to my lack of understanding on how this setup should work. Would you mind taking a look at my configuration and pointing out where I might be going wrong? You can also review the project here.

Note: I’m not using the runner.ex file in my project as I’ve updated to the latest version of AyeSQL. Don't know if I still need one or not.


Queries Module

I'm unsure if repo: DuckCaller is correct in this context.

defmodule Queries do
  import AyeSQL, only: [defqueries: 3]

  defqueries(Queries, "../resources/queries/core.sql", repo: DuckCaller)
end

SQL File

Located at: /resources/queries/core.sql

-- name: simple_update
-- docs: A parameterized SQL update statement where table, field, and value are required.
UPDATE :table
SET :field = :value

-- name: get_all_buyers
SELECT * FROM Buyer

Testing

{:ok, conn} = DuckCaller.create!("DC.duckdb")
xl = "./resources/test_data/test_data.xlsx"
# Loading the database
DuckCaller.IO.from_excel!(conn, xl) 
q = Queries.get_all_buyers()

iex = ** (UndefinedFunctionError) function Queries.get_all_buyers/0 is undefined (module Queries is not available) Queries.get_all_buyers()

Any guidance would be greatly appreciated!

alexdesousa commented 4 weeks ago

@bhougland18 You're almost there! With AyeSQL there are two ways of creating query modules. You're using both at the same time.

So, either you use:

import AyeSQL, only: [defqueries: 3]
defqueries(Queries, "../resources/queries/core.sql", runner: AyeSQL.Runner.Duckdbex)

or

defmodule Queries do
  use AyeSQL, runner: AyeSQL.Runner.Duckdbex

  defqueries "../resources/queries/core.sql"
end
bhougland18 commented 4 weeks ago

Hi @alexdesousa,

Hey, sorry to hit you up yet again, I feel like I am really close but I am getting an error when I try this in iex.

I used your second example and create a module for Queries and my SQL file really hasn't changed. I am trying the "get_all_buyers" sql script because it doesn't have any variables and I figured that would be an easy example. I compiled the app and dropped into iex and here are my two attempts:

q = DuckCaller.Queries.get_all_buyers()

** (UndefinedFunctionError) function DuckCaller.Queries.get_all_buyers/0 is undefined or private. Did you mean:

  *get_all_buyers/1
  *get_all_buyers/2

(duck_caller 0.1.0) DuckCaller.Queries.get_all_buyers()
iex:3: (file)

I tried this because I thought it would be the closest to your example of "query = ATP.by_last_name("Federer")" from above, but I figured this probably changed once you added Duckdbex to your library and wired everything up. So next I tried this:

# DC.duckdb is a duckdb database that is already populated
{:ok, conn} = DuckCaller.connect!("DC.duckdb")
res = DuckCaller.Queries.get_all_buyers(conn)

* (UndefinedFunctionError) function AyeSQL.Runner.Duckdbex.run/2 is undefined (module AyeSQL.Runner.Duckdbex is not available) AyeSQL.Runner.Duckdbex.run(%AyeSQL.Query{statement: "Select From Buyer", arguments: []}, []) iex:2: (file)

Again, I apologize for the constant questions, I think I should be good for a while once I get this working for the first time.

alexdesousa commented 4 weeks ago

@bhougland18 So, given this file:

-- name: simple_update
-- docs: A parameterized SQL update statement where table, field, and value are required.
UPDATE :table
SET :field = :value

-- name: get_all_buyers
SELECT * FROM Buyer

this module:

defmodule Queries do
  use AyeSQL, runner: AyeSQL.Runner.Duckdbex

  defqueries "../resources/queries/core.sql"
end

and this way of getting the database connection:

{:ok, conn} = DuckCaller.connect!("DC.duckdb")

I would expect to be able to call the queries like:

Queries.simple_update([table: "Buyer", field: "name", value: "John"], conn: conn)
Queries.get_all_buyers([], conn: conn)

Additionally, make sure you have the latest AyeSQL version by doing mix deps.update ayesql. You can check your mix.lock as well to make sure you're using version 1.1.3.

For more information, here you also have a DuckDB specific example.

bhougland18 commented 4 weeks ago

Hooray! I apologize, you were right, my mix lock was still at 1.1.2 even though I ran mix deps.get. Your last example worked, thank you!

alexdesousa commented 4 weeks ago

@bhougland18 You're welcome! Hope you find the library useful 😁