SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

from_csv should support loading specified columns as date columns #353

Open parthm opened 7 years ago

parthm commented 7 years ago

CSVs frequently contain multiple date columns. These are not necessarily indexes. It would be good if from_csv function provided an easy way to load column(s) as dates. As it stands now, the CSV needs to be loaded and the columns converted in to dates.

For reference, Pandas read_csv function supports a parse_dates argument for a similar purpose.

At the moment I have a simple wrapper csv_as_dataframe(path, date_columns=[]) to from_csv that reads the CSV and convert the columns using the below naive implementation:

  date_columns.each do |col|
    df[col] = df[col].map do |v|
      if v
        Date.parse(v)
      else
        nil
      end
    end
  end

It can probably be better optimized to handler multiple columns at once so it's not N^2. Not sure if something like this is already supported in Daru as I am fairly new to it.

v0dro commented 7 years ago

This would be a great feature.