SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

Add way to generate DataFrame from active_record with aggregated fields #523

Open janpeterka opened 4 years ago

janpeterka commented 4 years ago

Recently I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.

Here is what I wanted to use:

active_record = Provider.left_join(zip: :district).group(:id)

However I found out that I cannot give field like ANY_VALUE(district.id), because it gets converted to symbol infrom_activerecord, and subsequently pluck tries to convert it to table.column. (At least thats how I understand it works).

So, we found out way to bypass this and I was thinking about adding this to daru, in something like this:

      # Load dataframe from AR::Relation
      #
      # @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
      # @param with_sql_methods [Boolean] Enables giving fields with SQL methods
      #
      # @return A dataframe containing the data in the given relation
      def from_activerecord(relation, *fields, with_sql_methods: false)
        fields = relation.klass.column_names if fields.empty?

        fields = if with_sql_methods
                   fields.map(&:to_s)
                 else
                   fields.map(&:to_sym)

        result = relation.pluck(*fields).transpose
        Daru::DataFrame.new(result, order: fields).tap(&:update)
      end

Now I can create new DataFrame as

data_frame = Daru::DataFrame.from_activerecord(active_record,
                                              ["ANY_VALUE(district.id)"],
                                              with_sql_methods: true)

What do you think about that?

athityakumar commented 4 years ago

@janmpeterka - Thanks for this feature request suggestion! 🎉

You'd have to contribute this to the daru-io repository for this. We currently have the implementation of ActiveRecord importer here, wherein we support just normal field names and not sql methods. You can probably add has_sql_methods flag / keyword argument - the rest of the logic you might already find in the existing importer logic itself 😄

Would you like to contribute this feature, @janmpeterka?

janpeterka commented 3 years ago

Thanks, I will look into it. Not sure if I will be able to write the implementation myself, quite new to Ruby :)

janpeterka commented 3 years ago

Well, daru-io is quite inactive, so no contributing there.