Add way to generate DataFrame from active_record with aggregated fields

I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.

Here is what I wanted to use:

active_record = Provider.left_join(zip: :district).group(:id)

However I found out that I cannot give field like ANY_VALUE(district.id), because it gets converted to symbol infrom_activerecord, and subsequently pluck tries to convert it to table.column. (At least thats how I understand it works).

So, we found out way to bypass this and I was thinking about adding this to daru, in something like this:

      # Load dataframe from AR::Relation
      #
      # @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
      # @param with_sql_methods [Boolean] Enables giving fields with SQL methods
      #
      # @return A dataframe containing the data in the given relation
      def from_activerecord(relation, *fields, with_sql_methods: false)
        fields = relation.klass.column_names if fields.empty?

        fields = if with_sql_methods
                   fields.map(&:to_s)
                 else
                   fields.map(&:to_sym)

        result = relation.pluck(*fields).transpose
        Daru::DataFrame.new(result, order: fields).tap(&:update)
      end

Now I can create new DataFrame as

data_frame = Daru::DataFrame.from_activerecord(active_record,
                                              ["ANY_VALUE(district.id)"],
                                              with_sql_methods: true)

What do you think about that?

Originally posted here - https://github.com/SciRuby/daru/issues/523

SciRuby / daru-io

Add way to generate DataFrame from active_record with aggregated fields #80