Closed baarkerlounger closed 6 years ago
You can use more Ruby-idiomatic filter(:row)
:
df.filter(:row) { |r| r[:y1].include?('action') }
# => #<Daru::DataFrame(1x2)>
# y1 y2
# a action|thr 9
This idea of bool arrays was borrowed from pandas, but it would be probably retired in future, in favor of more idiomatic ways.
(Though, I should say that filter
currently is slower)
The bool array has the advantage of being easy to &
/|
with other filtering/slicing methods. Neither version seems particularly nice compared to the Pandas way though.
Hm. Can you please show how pandas is more powerful? For me it looks like natural Ruby's blocks can do everything (just can be pretty slow on it)
I'm not saying more powerful necessarily but I think nicer to work with (and performance). Compare:
Pandas: filtered = df[(df['y1'] == 'Movie') & (df['y2'].str.contains('Action'))]
Daru: filtered = df.where(df[:y1].eq('Movie')).filter(:row) { |r| r[:y2].include?('Action')}
The first example to me it's more readable my filter has 2 conditions.
filtered = df.filter(:row) { |r| r[:y1] == 'Movie' && r[:y2].include?('Action')}
...just like you'd filter your usual arrays.
Looking to do something along the lines of
df.where(df[:y1].contains('action'))
Pandas has
df['y1'].str.contains('action')
My current workaround looks like: