go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.97k stars 275 forks source link

how to get the SQLDatabase into GOTA #38

Open vikram-rawat opened 6 years ago

vikram-rawat commented 6 years ago

can you please subscribe or detail a method to get the data collected from SQL into GOTA.

Something like

package main

import ( "database/sql" "fmt" _ "github.com/lib/pq" )

const ( host = "localhost" port = 5432 user = "postgres" password = "Gurgaon@65" dbname = "vikram" )

func main() { psqlInfo := fmt.Sprintf("host=%s port=%d user=%s "+ "password=%s dbname=%s sslmode=disable", host, port, user, password, dbname) db, err := sql.Open("postgres", psqlInfo) if err != nil { panic(err) } defer db.Close()

err = db.Ping()
if err != nil {
    panic(err)
}
row,_:=db.Query("SELECT * from salesdata limit 10")

println(row)

}

kniren commented 6 years ago

Use the dev branch and load your SQL rows either as a slice of structs or a []map[string]interface and use the function LoadStructs/LoadMaps.

On Sep 9, 2017 09:54, "vikram-rawat" notifications@github.com wrote:

can you please subscribe or detail a method to get the data collected from SQL into GOTA.

Something like

package main

import ( "database/sql" "fmt" _ "github.com/lib/pq" )

const ( host = "localhost" port = 5432 user = "postgres" password = "Gurgaon@65" dbname = "vikram" )

func main() { psqlInfo := fmt.Sprintf("host=%s port=%d user=%s "+ "password=%s dbname=%s sslmode=disable", host, port, user, password, dbname) db, err := sql.Open("postgres", psqlInfo) if err != nil { panic(err) } defer db.Close()

err = db.Ping() if err != nil { panic(err) } row,_:=db.Query("SELECT * from salesdata limit 10")

println(row)

}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kniren/gota/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/ABm4Or5iK02G9SEUTKPJ4WsqAUeFQ2rEks5sgkQlgaJpZM4PR64f .

vikram-rawat commented 6 years ago

I did it a multiple times but the problem with this approach is i have to know beforehand the number of columns and their data type in advance.and in the phase of exploratory dara analysis this is very frustrating. Can you please create something like dataframe.csv for sql too. Which can return a data frame please

kniren commented 6 years ago

I'll include this on the roadmap for next release. Loading data directly from SQL could be useful.

vikram-rawat commented 6 years ago

Is it implemented yet???

Very very curious to work with databases through a data frame

vikram-rawat commented 6 years ago

Is it implemented yet.

I am an r user and work on databases with like 50+ columns along with time data types.

I am waiting for this library to get mature so that I can use it in real time scenarios.

Please work on it its a great project and one of a kind in go community. Analysis can not be done with for loops and in organized structs or slices of structs.

Please don't abandon it.

mittenchops commented 6 years ago

Hi @vikram-rawat , I am also an R user who likes this package a lot and trying to use it more. I have also used maps of strings to empty interfaces like this approach does to handling unknown json data: https://www.sohamkamani.com/blog/2017/10/18/parsing-json-in-golang/ What is the error you get using maps to empty interfaces that is trouble with unknown numbers of columns? Maybe we can figure out a way of intelligently guessing that's an easy to use, too.

mittenchops commented 6 years ago

Or, maybe post a sample of your data to a gist, and I'll try to help, too? Cheers.

vikram-rawat commented 6 years ago

Hi sorry It's been quite a long time. I forgot to check the link again.

What I actually meant was that say I run a query like row,_:=db.Query("SELECT * from sometable limit 10")

can I get that data directly into dataframe.without specifying column name or column data type.

because most of the queries I run get me many columns and sometime I don't even know how many will it return. Is there anyway I can do that.

vikram-rawat commented 5 years ago

Is this library no longer maintained.

danicat commented 5 years ago

It has been a while since I've last seem any updates on it. If we think this package is relevant to the community maybe we should have it forked. @kniren do you still plan on updating this package?

vikram-rawat commented 5 years ago

is it in the pipeline?? 👍

kmrx commented 3 years ago

My thoughts:

With that said, here is what you're probably looking for -

DISCLAIMER: I'm not the original author of this method. Even though it includes time formatting in For loop, it doesn't implement it. So you might still need to tinker with it. You will end up falling back into converting to string. That makes the dates look funky.

To get column names, you can do

dataColumns, err := rows.Columns()

To get column types, you can do

columnTypes, err := rows.ColumnTypes()

and then to get it into Dataframe

columnCount := len(columnTypes)
dataFrameRows := []map[string]interface{}{}

for rows.Next() {

scanArgs := make([]interface{}, columnCount )

for i, v := range columnTypes {

    switch v.DatabaseTypeName() {
    case "VARCHAR", "TEXT":
        scanArgs[i] = new(sql.NullString)
        break

    default:
        scanArgs[i] = new(sql.NullString)
    }
}

err := rows.Scan(scanArgs...)

if err != nil {
    fmt.Println(err)
}

rowData := map[string]interface{}{}

for typeIndex, typeValue := range columnTypes {

    if val, ok := (scanArgs[typeIndex]).(*sql.NullBool); ok {
        rowData[typeValue .Name()] = val.Bool
        continue
    }

    if val, ok := (scanArgs[typeIndex]).(*sql.NullTime); ok {
        rowData[typeValue .Name()] = val.Time.Format("01-02-2006")
        continue
    }

    if val, ok := (scanArgs[typeIndex]).(*sql.NullString); ok {
        rowData[v.Name()] = val.String
        continue
    }

    if val, ok := (scanArgs[typeIndex]).(*sql.NullInt64); ok {
        rowData[typeValue .Name()] = val.Int64
        continue
    }

    if val, ok := (scanArgs[typeIndex]).(*sql.NullFloat64); ok {
        rowData[typeValue .Name()] = val.Float64
        continue
    }

    if val, ok := (scanArgs[typeIndex]).(*sql.NullInt32); ok {
        rowData[typeValue .Name()] = val.Int32
        continue
    }

    rowData[typeValue.Name()] = scanArgs[typeIndex]
}

dataFrameRows = append(dataFrameRows , rowData)
}
df := dataframe.LoadMaps(dataFrameRows)

If there is interest however, we could include it in the dataframe after doing more tests on it and with ample disclaimers.

@vikram-rawat @kniren @danicat - thoughts?