SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

add which_dsl extension: use `which{ }` method as query DSL for accessing `where()` #396

Closed rainchen closed 6 years ago

rainchen commented 6 years ago

add a simple query DSL for accessing where(), inspired by gem "squeel" e.g.:

df.which{ `FamilySize` == `FamilySize`.max }`

equals

df.where( df['FamilySize'].eq( df['FamilySize'].max ) )

e.g.:

df.which{ (`NameTitle` == 'Dr') & (`Sex` == 'female') }

equals

df.where( df['NameTitle'].eq('Dr') & df['Sex'].eq('female') )

btw, repeating df variable is NOT DRY at all to me.

zverok commented 6 years ago

Pretty smart! TBH, I am not totally sure about backtick trick... Maybe method_missing/const_missing would be more predictable?.. (Sequel does this for its block DSL) It has its own drawbacks also, of course.

btw, repeating df variable is NOT DRY at all to me.

Yep, it is ugly as hell. It is an ancestry of copying Python's libraries, which I personally hate.

rainchen commented 6 years ago

some column names(from csv) may containing space or special characters like ' or - etc, for these cases backtick is better then method_missing, and it's really recognizable compared to a normal method.

zverok commented 6 years ago

@rainchen Yep, makes sense for me.

I'm willing to happily accept this, but it needs some tweaks for docs and specs:

Are you willing to make those changes? If not, I'll merge your PR, as it is a valuable contribution, and then update the style of specs/docs myself.

rainchen commented 6 years ago

@zverok

I implemented this feature as an "extension", requiring to use require 'daru/extensions/which_dsl' to enable this feature (took example from rserve extension)

I'm glad you can work on the docs and code origination parts.

Here are some actually use cases that I'm using this which method:

# getting oldest passengers
passengers.which { `Age` == `Age`.max }
# getting youngest passengers
passengers.which { `Age` == `Age`.min }
# getting female Dr
passengers.which { (`NameTitle` == 'Dr') & (`Sex` == 'female') }
# getting passengers had the same name
same_name = passengers.which { `Name`.frequencies > 1 } ["Name"].to_a
# => ["Connolly, Miss. Kate", "Kelly, Mr. James"]
passengers.which { `Name` =~ same_name }

hope these helpful for you or others be interested.

v0dro commented 6 years ago

This is awesome!

zverok commented 6 years ago

Thank you! :hugs:

rainchen commented 6 years ago

Thank you all!