go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

ReadCSV unable to handle quotes in non-quoted field / no LazyQuotes option? #95

Closed rterryn closed 3 years ago

rterryn commented 5 years ago

Hi, I keep getting the below error, which has a simple solution for go csv package, i.e., set LazyQuotes = TRUE. However, I am not seeing anything similar in LoadOptions for dataframe package. Is a similar option not available to deal with quotes that occur in the middle of a non-quoted field? If not can you suggest proper way to deal with this, as I am very new to GO?

Code: f, err := os.Open("/home/rterryn/go/src/gndwEtl/dataFiles/SGD_features.tab") if err != nil { // handle this error better than this panic(err) } df := dataframe.ReadCSV(bufio.NewReader(f), dataframe.WithDelimiter('\t'))

Error: "DataFrame error: parse error on line 265, column 303: bare " in non-quoted-field"

karthikcru commented 3 years ago

having the same issue

rterryn commented 3 years ago

having the same issue

Hi. Dataframes uses the standard csv library so I was able to add the lazyQuotes option by modifying dataframes.go with the following, but I do not have this on a public repo anywhere at the moment, sorry: //create lazy quotes bool variable inside the loadOptions struct type loadOptions struct { // Specifies which is the default type in case detectTypes is disabled. defaultType series.Type

// If set, the type of each column will be automatically detected unless
// otherwise specified.
detectTypes bool

// If set, the first row of the tabular structure will be used as column
// names.
hasHeader bool

// The names to set as columns names.
names []string

// Defines which values are going to be considered as NaN when parsing from string.
nanValues []string

// Defines the csv delimiter
delimiter rune

// EnablesLazyQuotes
lazyQuotes bool

// The types of specific columns can be specified via column name.
types map[string]series.Type

}

//add new function WithLazyQuotes func WithLazyQuotes(b bool) LoadOption { return func(c *loadOptions) { c.lazyQuotes = b } }

//modify the existing ReadCSV function with the lazyQuotes option func ReadCSV(r io.Reader, options ...LoadOption) DataFrame { csvReader := csv.NewReader(r) cfg := loadOptions{ delimiter: ',', lazyQuotes: false, } for _, option := range options { option(&cfg) } if cfg.delimiter != ',' { csvReader.Comma = cfg.delimiter }

if cfg.lazyQuotes != false {
    csvReader.LazyQuotes = true
}

records, err := csvReader.ReadAll()
if err != nil {
    return DataFrame{Err: err}
}
return LoadRecords(records, options...)

}