alexdesousa / ayesql

Library for using raw SQL in Elixir
MIT License
135 stars 13 forks source link

Duckdb Runner help #36

Open bhougland18 opened 1 week ago

bhougland18 commented 1 week ago

Hello,

I want to start by expressing my gratitude for developing this library; I'm eager to begin using it! However, I could use some assistance in creating a Runner for DuckDB, specifically with the DuckDBex library. In DuckDB, you can utilize a connection once you open it, and you can use the connection function like this:

{:ok, db} = Duckdbex.open(db_name)
{:ok, conn} = Duckdbex.connection(db)

As a newcomer to Elixir, I did my best to create the appropriate runner file, but I believe the "conn" in the Duckdbex.query function is incorrect. I think I should use "pool", but I'm unsure how that reference is set (an if it can be done with how duckdb works). Any assistance you can provide would be greatly appreciated. I believe an analytical database like DuckDB could really benefit from this type of library.

defmodule MyApp.Runner.Duckdb do
  use AyeSQL.Runner

  @impl AyeSQL
  def run(%AyeSQL.Query{statement: stmt, arguments: args}, options) do
    query_options = Keyword.drop(options, [:pool, :into])
    stmt = transform_stmt(stmt)

    with {:ok, ref} <- Duckdbex.query(conn, stmt) do
      columns = Duckdbex.columns(ref)
      rows = Duckdbex.fetch_all(ref)
      result = %{columns: columns, rows: rows}
      result = AyeSQL.Runner.handle_result(result)
      {:ok, result}
    end
  end

  @spec transform_stmt(AyeSQL.Query.statement()) :: AyeSQL.Query.statement()
  defp transform_stmt(stmt) do
    Regex.replace(~r/\$(\d+)/, stmt, "?")
  end
end

Thanks!

alexdesousa commented 1 week ago

@bhougland18 Thanks for your nice words about this project! :heart: Do you have a minimal project already setup where I can test this? This way I can pull it and test it locally :grin:

bhougland18 commented 1 week ago

@alexdesousa

Hey! I just added the repo for DuckCaller here. Here's how you can start testing it:

Compile the project. Run the following:

{:ok, conn} = DuckCaller.create!("TestDB.duckdb")
DuckCaller.IO.from_excel!(conn, "./data/test_data.xlsx")

Now you'll have a DuckDB database with tables populated from a sample Excel file. You can connect to the DuckDB using a compatible SQL editor, or use the DuckCaller.query!() function (which I plan to replace with your library). You can also display results in the terminal using DuckCaller.Print.to_term, but keep it to a few columns (5-8 max) for best results on smaller screens.

Here's a sample query you can try out:

r = DuckCaller.query!(conn, "Select * From Buyer;")

Let me know if you run into any issues!

I apologize for the sloppy code, I am new to Elixir and have only read half of the book "Joy of Elixir". Enter at your own risk, LOL!

alexdesousa commented 2 days ago

@bhougland18 Hello there! I had some time to test this Yesterday and I managed to have a DuckDB runner 😁 You were very close to the solution though!

TL;DR

The following runner should work:

defmodule AyeSQL.Runner.DuckDB do
  use AyeSQL.Runner

  @impl AyeSQL.Runner
  def run(%AyeSQL.Query{statement: stmt, arguments: args}, options) do
    conn = options[:conn] || raise ArgumentError, message: "Connection `:conn` cannot be `nil`"

    with {:ok, res} <- Duckdbex.query(conn, stmt, args) do
      columns = Duckdbex.columns(res)
      rows = Duckdbex.fetch_all(res)
      {:ok, AyeSQL.Runner.handle_result(%{columns: columns, rows: rows}, options)}
    end
  end
end

Full Solution

I decided to test it using CSV files, as it was the simplest way to have a minimal example. I created the file duck.exs with the following contents:

#!/usr/bin/env elixir

Mix.install([
  {:duckdbex, "~> 0.3"},
  {:ayesql, "~> 1.1"}
])

################################################################################
# Prepare test data

defmodule Database do
  @url "https://raw.githubusercontent.com/duckdb-in-action/examples/refs/heads/main/ch05/atp/atp_players.csv"

  def open do
    with {:ok, db} <- Duckdbex.open(),
         {:ok, conn} <- Duckdbex.connection(db),
         {:ok, _res} <- Duckdbex.query(conn, "INSTALL httpfs"),
         {:ok, _res} <- Duckdbex.query(conn, "LOAD httpfs"),
         {:ok, _res} <- Duckdbex.query(conn, "CREATE OR REPLACE TABLE atp AS FROM '#{@url}'") do
      {:ok, {db, conn}}
    end
  end
end

################################################################################
# AyeSQL runner

defmodule AyeSQL.Runner.DuckDB do
  use AyeSQL.Runner

  @impl AyeSQL.Runner
  def run(%AyeSQL.Query{statement: stmt, arguments: args}, options) do
    conn = options[:conn] || raise ArgumentError, message: "Connection `:conn` cannot be `nil`"

    with {:ok, res} <- Duckdbex.query(conn, stmt, args) do
      columns = Duckdbex.columns(res)
      rows = Duckdbex.fetch_all(res)
      {:ok, AyeSQL.Runner.handle_result(%{columns: columns, rows: rows}, options)}
    end
  end
end

################################################################################
# Setup query

defmodule ATP do
  def by_last_name(last_name) do
    %AyeSQL.Query{
      statement: "SELECT * FROM atp WHERE name_last = $1",
      arguments: [last_name]
    }
  end
end

################################################################################
# Run query

{:ok, {_db, conn}} = Database.open()
query = ATP.by_last_name("Federer")
{:ok, result} = AyeSQL.Runner.DuckDB.run(query, conn: conn, into: %{})

IO.inspect(result)

The script:

  1. Opens a connection and prepares the database by downloading a CSV file with the ATP players,
  2. Prepares the query by hand (not using the AyeSQL parser, though it should be equivalent to any AyeSQL query) to look for a player by last name, in this case Federer.
  3. Runs the query using the database connection.

The script should have the following result:

$ ./duck.exs
[
  %{
    player_id: 103819,
    name_first: "Roger",
    name_last: "Federer",
    hand: "R",
    dob: "19810808",
    ioc: "SUI",
    height: 185,
    wikidata_id: "Q1426"
  }
]

Note: You can run it with elixir duck.exs or changing the running permissions with chmod +x duck.exs and then ./duck.exs.

Conclusion

This could be a nice addition to a future version of AyeSQL. I never used DuckDB before, but I see how useful it can be 😁

I hope this helps!