TidierOrg / TidierDB.jl

Tidier database analysis in Julia, modeled after the dbplyr R package.
MIT License
35 stars 3 forks source link

column name "DATA_TYPE" not found in the data frame; existing most similar names are: "data_type" #11

Closed vituri closed 3 months ago

vituri commented 3 months ago

Hi!

I'm trying to use TidierDB to connect to a MariaDB 10.11 database. I got the following error:

ERROR: ArgumentError: column name "DATA_TYPE" not found in the data frame; existing most similar names are: "data_type"

Investigating the code, I noticed that the following line is the problem:

https://github.com/TidierOrg/TidierDB.jl/blob/6efba7fcd4085aaa3236c22b11378669b3bcb158/src/TidierDB.jl#L183

If I change from "DATA_TYPE" to "data_type" it works.

My code is the following:

using MySQL
using TidierData
import TidierDB as DB

con = DBInterface.connect(
    MySQL.Connection
        , # my credentials
    )

DB.db_table(con, :table_name)
vituri commented 3 months ago

I've been playing around with the code, and there is a easier fix. Just change the query to uppercase like this:

# MySQL
function get_table_metadata(conn::MySQL.Connection, table_name::String)
    # Query to get column names and types from INFORMATION_SCHEMA
    query = """
    SELECT COLUMN_NAME, DATA_TYPE
    FROM information_schema.columns
    WHERE table_name = '$table_name'
    AND TABLE_SCHEMA = '$(conn.db)'
    ORDER BY ordinal_position;
    """

    result = DBInterface.execute(conn, query) |> DataFrame
    result[!, :DATA_TYPE] = map(x -> String(x), result.DATA_TYPE)
    result[!, :current_selxn] .= 1
    result[!, :table_name] .= table_name
    # Adjust the select statement to include the new table_name column
    return select(result, :COLUMN_NAME => :name, 2 => :type, :current_selxn, :table_name)
end

I also added the line

    AND TABLE_SCHEMA = '$(conn.db)'

because if you have the same table_name on several databases (on the same host), the columns would appear all mixed.

drizk1 commented 3 months ago

Thank you for bringing this up. I think theres a case difference between how the names are returned from MySQL vs MariaDB. but i assumed they would be the same.

Can you run this function locally? This should now work independent of name for both MySQL and MariaDB. If it works, I will go ahead and put in a commit to fix it.

function get_table_metadata(conn::MySQL.Connection, table_name::String)
    # Query to get column names and types from INFORMATION_SCHEMA
    query = """
    SELECT column_name, data_type
    FROM information_schema.columns
    WHERE table_name = '$table_name'
    AND TABLE_SCHEMA = '$(conn.db)'
    ORDER BY ordinal_position;
    """

    result = DBInterface.execute(conn, query) |> DataFrame
    result[!, 2] = map(x -> String(x), result[!, 2])
    result[!, :current_selxn] .= 1
    result[!, :table_name] .= table_name
    # Adjust the select statement to include the new table_name column
    return DataFrames.select(result, :1 => :name, 2 => :type, :current_selxn, :table_name)
end

these should be the names of the columns

name     type     current_selxn  table_name
vituri commented 3 months ago

It worked like a charm!

I also tested some data transformations like

dados = @chain DB.db_table(con, :Resumo_frota) begin
    DB.@filter(Dia >= "2024-04-10")
    DB.@group_by(Operação)
    DB.@summarize(Total = sum(Total))
    DB.@mutate(Média = mean(Total))
    DB.@filter(Média >= 0.1)
    DB.@arrange(desc(Operação))
    # DB.@show_query
    DB.@collect
end

and it worked. The @show_query also worked.

Thanks a lot!

drizk1 commented 3 months ago

Incredible ! I love seeing TidierDB.jl mixed with Portuguese too!

I fixed the underlying code (thank you again for testing with MariaDB). I will close this issue when I release v 0.1.1 shortly.