go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
3.04k stars 281 forks source link

Is it possible to add a method to lazyread a csv? #41

Closed mittenchops closed 6 years ago

mittenchops commented 6 years ago

I'm doing this kind of thing:

    fmt.Println("Reading csv...")   
    csv, err := os.Open(myfile)  //myfile is 200M or so, takes awhile to read
    if err != nil {
        fmt.Print(err)
        os.Exit(1)
    }

    fmt.Println("Make it a df...")          
    df := dataframe.ReadCSV(csv)

    fmt.Println("Sorting, filtering df...")     
    fil := df.Filter(
        dataframe.F{"colA", series.Eq, "VARIABLE"},
    )

Would be very cool if my filtering could start happening as the initial lines are read.

mittenchops commented 6 years ago

I know there was a neat implementation here in R: http://illposed.net/lazy.frame.pdf

kniren commented 6 years ago

Thank you very much for the suggestion and the resource of R's implementation.

I would love to have this implemented on Gota, but the focus of this library is to operate directly from memory. All operations are performed under this assumption, so moving to a lazy implementation would imply a change in the entire codebase.

One thing that could be done is to delay the reading of the file until it is time to actually perform the filtering, but I don't see how that is better than the current method.

I will close this issue as there are no current plans of implementing this, but if anyone feels inclined on working on this we can discuss it through and set up a project for it.

Thank you again!