aritmos / crabql

Crustacean certified SQL query creation in Rust
MIT License
0 stars 0 forks source link
query-builder rust sql

✨ Features ✨

CrabQL allows you to generate SQL queries that are

🚧 CAUTION: W.I.P.

πŸ“š Use

The core of the library is a query builder that can creates the SQL queries. The current (partially implemented) use case is as follows:

let query = {
    let checker: ExprChecker::from_schema("schema.sql").unwrap();
    let reader: Reader = Reader::new(&mut checker);

    reader
        .table("marketing")
        .new_col("category", { 
            case!{                
                col("created_date").lt(Date::new("2020-04-01")) => "pre-pandemic",
                col("created_date").lt(Date::new("2024-01-01")) => "last year",
                _ => "recent",
            } 
        }) 
        .filter(col("category").neq("old") & col("completed"))
        .save_as("cost_by_month", {
            let month = col("created_date").to_char("YYYY-MM"); 
            multi![
                col("campaign_id").as("campaign"),
                month,
                col("cost").sum().as("monthly_cost")
            ]
        })
        .table("cost_by_month")
        .group_by("campaign") 
        .order_by("campaign")
        .select(() << col("campaign") << col("monthly_cost").avg().as("avg cost"))
        .unwrap()
        .to_sql()
});

println!("{}", query);

outputs

WITH cost_by_month AS (
    SELECT
        campaign_id                      AS campaign,
        TO_CHAR(created_date, 'YYYY-MM') AS created_date,
        SUM(cost)                        AS total_cost
    FROM (
        SELECT *,
            CASE
                WHEN created_date < DATE '2020-04-01' THEN 'pre-pandemic'
                WHEN created_date < DATE '2024-01-01' THEN 'last year'
                ELSE 'recent' END AS category
        FROM marketing
    ) x
    WHERE category != 'pre-pandemic' AND completed = TRUE
)

SELECT campaign, AVG(total_cost) AS 'avg cost'
FROM cost_by_month
GROUP BY campaign
ORDER BY campaign

CrabQL is built with different SQL flavours in mind, and plans to eventually support all major dialects. However during it's early stages the focus will be on generating PostgreSQL-compliant queries, as that is what I personally use.

🌿 Natural

One of the core values of CrabQL is that of writing natural Rust code. In particular this means discarding the common use of string expression parsing as seen in similar crates, e.g.

selection
    .where("version > 12 AND is_stable")
    .select("'v' + str(version) + '_x86_64-linux'")

CrabQL uses proper types that can be composed via natural methods along with the overloading of common binary operators such as BitAnd/BitOr/Neg for logical AND/OR/NOT operations, as well as all the common arithmetic operators. Sadly the nature of Rust's type system doesn't allow for the specific overloading of comparison operators that is required, and these operations are left as methods.

selection
    .where(col("version").gt(12) & col("is_stable"))
    .select_one("v" + col("version") + "_x86_64-linux")

A type system like this, unlike string parsing, allows for the use of variables, comments and spacing to write clearer, more understandable expressions. It also allows one to call functions, APIs or any other endpoint to get necessary data. As an example, the following (100% real I-saw-it-in-production-last-week) expression is exteremly ill-suited for string parsing:

SELECT *
FROM employees
WHERE (salary > 100_000 OR department = 'Engineering')
  AND (start_date >= '2023-01-01' AND start_date < '2024-01-01')
  AND (age BETWEEN 25 AND 35 AND status = 'Active')
  AND (city IN ('New York', 'San Francisco', 'Seattle') OR manager_id IS NOT NULL)
  AND (NOT (position = 'Manager' AND (department = 'HR' OR department = 'Finance')))
  AND (EXISTS (SELECT 1 FROM performance_reviews WHERE employee_id = employees.id AND rating > 4))
  AND (NOT EXISTS (SELECT 1 FROM disciplinary_actions WHERE employee_id = employees.id))

With CrabQL the expression takes on a simple and readable form

reader.table("employees").filter({
    const HIGH_SALARY = 100_000;
    let high_salary_engineers = col("salary").gt(HIGH_SALARY) & col("department").eq("Engineering"); 
    let active_and_young = col("age").between(25, 35) & col("status").eq("Active");
    let big_office = col("city").in(external_api.get_main_offices())
    let is_manager = !col("manager_id").is_null();
    ...

    high_salary_engineers & ... & (big_office | is_manager) & ...
})

πŸš‘ Safe

Type Safety

One of the benefits of foregoing string-parsing is gaining the type-safety that rust can provide.

State Safety

The other core safety-related feature, is that CrabQL guarantees query correctness within the query's compilation.

let mut checker = DerivedChecker::new(); // internally uses a `DerivedSchema`

// The checker infers that the <likes> column is numeric.
let view1 = Reader::new(&mut checker)
    .table("post_likes")
    .select(() << col("post_id") << (col("likes") + 1)) 
    .unwrap();

// The checker will throw an error when sealing the query because <likes> column is numeric but string concatenation requires a string.
let view2 = Reader::new(&mut checker)
    .table("post_likes")
    .select(() << col("post_id") << (col("likes") + "text")) 

Although the error is imediately found when an expression is used in a method, in order to allow for better type ergonomics, it is not exposed to the user until the entire query is sealed (finished)

Ordering Safety

Both the CompiledSchema and DerivedSchema update themselves to include any internal views or table modifications that the user defines. Self-consistency is a great feature to have in long read-write operations where the database's internal state is bound to change. CrabQL aims to implement a locking mechanism to ensure that any cases that depend on a specific query-ordering can only procude SQL queries that guarantee this ordering.

For example a user might create a query that defines a new column within a table, and a subsequent query might write values into that new column. The aim is to ensure that these two queries can only be parsed as a single combined SQL query ensuring their necessary order.

// TODO: API is still too-underdeveloped to showcase how the locking mechanism works
//       In particular how queries should be grouped before creating the complete SQL.
//       (Imagine the implementation yourself if you want)

🀨 Why??

All of this talk about types, and state checking, manipulation and safety seems like quite a lot of complexity for what tends to be a generally straightforward task. Is it really worth it? Yes ... maybe ... no ... who knows.

It is likely that the large majority of projects that manipulate databases through an application boundary keep their queries nice and small. In many cases it might simply suffice to use a simpler string-parsing library.

However when thinking about all of the possible implementations of a Rust-to-SQL library this version with extensive use of Rust features in the style of Rust's safety guarantees quickly became my favourite. A transpiler like this is vastly interesting to just think about how to handle the internal representations and logic, and it's complexity is something I'm hoping to push me into learning more and writing better Rust.