ankane / ruby-polars

Blazingly fast DataFrames for Ruby
MIT License
852 stars 33 forks source link

Request: Implement `_dump_data` so that `Rails.cache` can wrap a DataFrame #79

Closed DeflateAwning closed 1 month ago

DeflateAwning commented 1 month ago

Request: Implement _dump_data so that Rails.cache can wrap a DataFrame

Currently, the following error is returned:

TypeError (no _dump_data is defined for class Polars::RbDataFrame):

Test case (roughly, may require some rework)

  def call
    cache_key = "expensive_operation_1"
    Rails.cache.fetch(cache_key, expires_in: 6.hour) do
      call_no_cache_expensive_operation_1 # Returns a dataframe
    end
  end
DeflateAwning commented 1 month ago

Here's my current workaround, which isn't ideal:


  def call
    # Cache the result.
    df_as_parquet_base64 = Rails.cache.fetch("cache_key", expires_in: 6.hour) do
      df = call_no_cache

      # Serialize the DataFrame to bytes then base64. Sorta a long-winded way for now, but it works.
      # TODO: Try this method: https://github.com/ankane/ruby-polars/issues/79
      df_pq_tempfile_stringio = StringIO.new
      df.write_ipc df_pq_tempfile_stringio
      df_pq_tempfile_stringio.rewind

      # Read, and encode in base64.
      as_b64 = Base64.encode64(df_pq_tempfile_stringio.read)
      as_b64
    end

    df = Polars.read_ipc(StringIO.new(Base64.decode64(df_as_parquet_base64)))
  end
ankane commented 1 month ago

Hi @DeflateAwning, I'm not sure I'd like to support marshal serialization (as it's not secure for untrusted data), but you could add it to your own application with:

class Polars::DataFrame
  def _dump(level)
    write_ipc(nil)
  end

  def self._load(bin)
    Polars.read_ipc(StringIO.new(bin))
  end
end
DeflateAwning commented 1 month ago

Hmm, even with that, I'm still getting TypeError (no _dump_data is defined for class Polars::RbDataFrame):