go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

changing float to string precision #137

Open MichalBrzozowski91 opened 3 years ago

MichalBrzozowski91 commented 3 years ago

Function fmt.Sprintf("%f", some_float_number) converts float to string with a default precision equal to 6: The default precision for %e, %f and %#g is 6; for %g it is the smallest number of digits necessary to identify the value uniquely. (source). Changing format from "%f" to "%g" gives a precision necessary for saving floating point numbers without loss. According to the above documentation this precision is also a default option for converting floats to strings.

The current precision leads to the inconsistencies even when we only open and write a csv file again. Minimal working example:

Running a program:

package main

import (
    "os"

    "github.com/go-gota/gota/dataframe"
)

func main() {
    csvfile, _ := os.Open("dataset.csv")
    df := dataframe.ReadCSV(csvfile)
    f, _ := os.Create("output.csv")
    df.WriteCSV(f)
}

with a csv file:

index,value
0,0.00051124743

produces the output csv file:

index,value
0,0.000511

Suggested change solves this issue.

Moreover the casting to string is used in the function Rapply. Because of that applying an identity function to a dataframe changes it. Minimal working example:

Running a program:

package main

import (
    "fmt"
    "os"

    "github.com/go-gota/gota/dataframe"
    "github.com/go-gota/gota/series"
)

func main() {
    csvfile, _ := os.Open("dataset.csv")
    df := dataframe.ReadCSV(csvfile)
    g := func(s series.Series) series.Series { return s }
    dfApplied := df.Rapply(g)
    fmt.Println(dfApplied.Elem(0, 1).Float())

}

with a csv file:

name,value
a,0.00051124743

prints only 6 digits:

0.000511

This issue is solved as well by the suggested change.