I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.
However I found out that I cannot give field like ANY_VALUE(district.id), because it gets converted to symbol infrom_activerecord, and subsequently pluck tries to convert it to table.column.
(At least thats how I understand it works).
So, we found out way to bypass this and I was thinking about adding this to daru, in something like this:
# Load dataframe from AR::Relation
#
# @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
# @param with_sql_methods [Boolean] Enables giving fields with SQL methods
#
# @return A dataframe containing the data in the given relation
def from_activerecord(relation, *fields, with_sql_methods: false)
fields = relation.klass.column_names if fields.empty?
fields = if with_sql_methods
fields.map(&:to_s)
else
fields.map(&:to_sym)
result = relation.pluck(*fields).transpose
Daru::DataFrame.new(result, order: fields).tap(&:update)
end
I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.
Here is what I wanted to use:
However I found out that I cannot give field like
ANY_VALUE(district.id)
, because it gets converted to symbol infrom_activerecord
, and subsequentlypluck
tries to convert it totable
.column
. (At least thats how I understand it works).So, we found out way to bypass this and I was thinking about adding this to
daru
, in something like this:Now I can create new DataFrame as
What do you think about that?
Originally posted here - https://github.com/SciRuby/daru/issues/523