go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

Filter with In on Quoted String returns False #115

Closed dp1140a closed 4 years ago

dp1140a commented 4 years ago

Im attempting to Filter on a String list that is quoted. My Filter is as follows:

df = df.Filter(dataframe.F{ Colname: "XXX", Comparator: series.In, Comparando: "LA", })

Here are some sample rows form the column I am filtering on has Strings that can look like: "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "CC,DC,FR,FW,KH,MG,WD,WB" "IS,KH,MG,WD" "CC,FC,FS,SC" "IS,KH,MG,WD" "FC,LA,LC,UQ" "CC,CF,CS,FC,FS,KH,LA,LC,MG,WD,WB" "CC,FC,FS,SC" "DC,FR" "DC,FR" UNK UNK "DC,FR" FW

This should return 7 rows: "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "DC,FC,FS,FW,LA,LC,MG" "FC,LA,LC,UQ" "CC,CF,CS,FC,FS,KH,LA,LC,MG,WD,WB"

But when I run this I get 0 rows back. I think this could be due to the quoted strings which I cant control since they come from a csv file. Or is there a way to pass Regex or a wildcard in as my comparando?

dp1140a commented 4 years ago

After taking a look at the code directly I think the issue is at https://github.com/go-gota/gota/blob/master/series/series.go#L402. Instead of doing a recurse with "eq" comparando perhaps just doing a simple string.Contains() would suffice

gautamdoulani commented 4 years ago

@dp1140a , looks like you are looking for a feature similar to like (which does not seem to have been implemented yet) and not in

chrstphlbr commented 4 years ago

Hi @dp1140a, you can use user-defined comparators, by implementing your own series.CompFunc, for this. This features has not landed in the main branch yet but can be found in dev. Your filter would look something like this then:

like := func(str string) func(el series.Element) bool {
        return func (el series.Element) bool {
            if el.Type() == series.String {
                if val, ok := el.Val().(string); ok {
                    return strings.Contains(val, str)
                }
            }
            return false
        }
    }

myDF := df.Filter(
    dataframe.F{
        Colname: "XXX",
        Comperator: series.CompFunc,
        Comperando: like("LA"),
    },
)

Best Christoph

kniren commented 4 years ago

@chrstphlbr mentions a possible solution here. If there is no further feedback I'm going to close this issue, but feel free to reopen it as needed.

feluelle commented 4 years ago

When I try to run @chrstphlbr solution I get two issues:

  1. if el.Type() == String needs to be reflect.TypeOf(el).Kind() == reflect.String ?!
  2. Can not find series.CompFunc. What am I missing here?
chrstphlbr commented 4 years ago
  1. Should be if el.Type() == series.String (I have updated my original answer).
  2. Are you depending on branch dev? This feature has not landed in master yet.
feluelle commented 4 years ago

Thanks @chrstphlbr. This works. 👍